DLRG@HASOC 2020: A hybrid approach for hate and offensive content identification in multilingual tweets

Rajalakshmi R; B. Yashwanth Reddy

In recent times, most of the people prefer social media platforms as a communication tool and express their views publicly and anonymously. Hate speech and posting offensive contents has become a major issue nowadays. To handle these problems, automated methods are necessary that can help to analyse the social media posts and to identify the hate speech. Existing methods do not focus more on multilingual posts and it poses more challenges, not only due to the linguistic properties but also due to the class imbalance problem. The task of identifying hate and offensive content posted in Hindi or German languages has the same issues. To address the problem of class imbalance, we have combined a over sampling technique with a suitable feature weighting method. In the proposed approach, Multi-class imbalance-based feature selection method is combined with an SVM classifier to classify the tweet as a hate speech or not. This work was submitted to Hate and Offensive Content Identification (HASOC) task@FIRE2020 and scored third rank. We have achieved an accuracy of 80% and 72% on the released German and Hindi language tweets respectively. © 2020 Copyright for this paper by its authors.

Journal	CEUR Workshop Proceedings
Publisher	CEUR-WS
ISSN	16130073