Header menu link for other important links
X
An Effective and Discriminative Feature Learning for URL Based Web Page Classification
, Aravindan C.
Published in IEEE
2018
Pages: 1374 - 1379
Abstract
Ever growing World Wide Web results in a large volume of web pages with variety of topics. Many applications such as information filtering and focused crawling demand large scale topic classification of a web page. To classify the web pages, URL based approach is proposed by which downloading the contents of the web page for classification purpose is avoided. In this paper, an automated way of learning category specific universal dictionary of discriminating URL features is proposed. Using this automatically learnt dictionary, the feature vector dimensionality is made independent of training set and it overcomes the difficulty of handling large scale data. For constructing this dictionary, publicly available ODP dataset have been used. The proposed approach was evaluated by applying the automatically learnt URL feature dictionaries on another dataset that contains search results from Google. Through experiments, it is shown that macro-average precision, recall and F1 values of 0.93, 0.85 and 0.88 have been achieved. We have observed that, the difference is not statistically significant when the universal dictionary is applied instead of using dataset-specific term dictionary. © 2018 IEEE.
About the journal
JournalData powered by Typeset2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC)
PublisherData powered by TypesetIEEE
Open AccessNo