Header menu link for other important links
X
Keyword weight optimization using gradient strategies in event focused web crawling
S. Rajiv,
Published in Elsevier B.V.
2021
Volume: 142
   
Pages: 3 - 10
Abstract
At present, a need for an integrated event focused crawling system for obtaining web data regarding key events is felt. At the time of a disaster or any other important event, several users attempt to find updated information regarding the event. The work has proposed a new and efficient method for such keyword set enhancement. Today, information has been growing rapidly, and it can be very challenging for any search engine to retrieve the necessary information properly. A web crawler is a primary unit of such search engines, and for this, their optimization could have been a major aspect of improving the efficiency of search. The large size and active nature of web information and continuous documentation and data updates are known as the web-based retrieval system. This focused crawling method concentrates on the automatic webpage classification which was used for determining the web page. Though various classifiers are used for determining the webpages, the identification of keywords plays an important role in improving the event focused web crawling. The proposed work has a novel and efficient method for such keyword set enhancement. Metaheuristic based optimized keyword weights are found to be efficient. The Term Frequency (TF) based feature extraction and a keyword weight optimization using the Stochastic Gradient Descent (SGD) algorithm is employed in an event focused web crawling. Gradient descent is a popular algorithm to achieve optimization, and the stochastic algorithm has the advantage of sub-differentiable and differentiable smoothness in the fitness function and is well suited for large data optimization. The algorithm is focused on making the keyword set optimal, and in case the keyword set is found to be better, the result documents returned can be even more relevant to users' queries. For this, Support Vector Machine (SVM) classifiers are employed. The experimental outcomes proved that the suggested technique outperformed the others, including the Particle Swarm Optimization (PSO) based weight-optimized solution. The proposed SGD weight optimization is better by 5.8% compared to PSO, showing its ability to examine high volumes of data. © 2020
About the journal
JournalData powered by TypesetPattern Recognition Letters
PublisherData powered by TypesetElsevier B.V.
ISSN01678655