DLRG@HASOC 2019: An enhanced ensemble classifier for hate and offensive content identification

Rajalakshmi R; B. Yashwant Reddy

Recent advancements in the Internet technologies have made a tremendous change in the social media. Hate Speech is an attack that is directed towards a group of people based on their religion, gender, colour etc. The offensive content in social media poses a threat to democracy. As these kind of hate speech and offensive content on the web increases day by day, manually monitoring or controlling such hate crimes is a highly challenging task. Most of the existing methodologies focus on English language tweets and only limited work has been reported for Hindi and German language posts. Also, the importance of feature se- lection methods is not explored much for this problem. In this research work, an enhanced ensemble classifier approach is proposed to identify hate and offensive content posted in Hindi or German languages. In the proposed approach, CHI square based feature selection method is com- bined with a Random Forest Classifier to classify the tweets. This work was submitted to Hate and Offensive Content Identification (HASOC) task@FIRE2019. From the various experiments conducted on the re- leased HASOC dataset, it is shown that an accuracy of 81% and 64% was achieved on German and Hindi language tweets. © Copyright 2019 for this paper by its authors.

Journal	CEUR Workshop Proceedings
Publisher	CEUR-WS
ISSN	16130073