Header menu link for other important links
X
Identifying health domain URLs using SVM
Published in Association for Computing Machinery
2015
Volume: 10-13-August-2015
   
Pages: 203 - 208
Abstract
World Wide Web contains large volume of information on various topics. Especially, in health domain, people surf the net before consulting experts. But it is not guaranteed that, only the relevant health related pages are retrieved. So there is a need for an automated system that could assist in identifying health related web pages. In this paper, an URL based approach is proposed to identify health domain URLs that will help to avoid fetching irrelevant web pages. One of the issues in URL based topic classification is difficulty in selection of suitable URL features. In this paper, only the 4-grams derived from URLs are used as features to determine the health related web page, without using any medical repository. Statistical dictionary based methods have been reported in the literature, but construction of such dictionary is not automatic. A machine learning technique to automatically learn statistical dictionary of terms from the training URLs is proposed. To classify a web page either as a health page or not, SVM binary classifier is designed with a dictionary of 4-grams derived from URLs. The bench mark dataset ODP has been used for evaluating the performance by conducting various experiments. With the proposed URL based approach, 87% of precision has been achieved, which is a significant improvement over the existing techniques. © 2015 ACM.
About the journal
JournalData powered by TypesetACM International Conference Proceeding Series
PublisherData powered by TypesetAssociation for Computing Machinery