Header menu link for other important links
X
System Design of Cloud Search Engine Based on Rich Text Content
H.-P. Chan, L. Xu, H.-H. Liu, R.-T. Zhang,
Published in Springer
2021
Volume: 26
   
Issue: 1
Pages: 459 - 472
Abstract
In order to improve the search performance of rich text content, a cloud search engine system based on rich text content is designed. On the basis of traditional search engine hardware system, several hardware devices such as Solr index server, collector, Chinese word segmentation device and searcher are installed, and the data interface is adjusted. On the basis of hardware equipment and database support, this paper uses the open source Apache Tika framework to obtain the metadata of rich text documents, implements word segmentation according to the rich text content and semantics, and calculates the weight of each keyword. Input search keywords, establish a text index, use BM25 algorithm to calculate the similarity between keywords and text, and output the search results of rich text according to the similarity calculation results. The experimental results show that the design system has high recall rate, high throughput, and the construction time of each data item index in different files is short, which improves the search efficiency and search accuracy. © 2020, Springer Science+Business Media, LLC, part of Springer Nature.
About the journal
JournalData powered by TypesetMobile Networks and Applications
PublisherData powered by TypesetSpringer
ISSN1383469X