Document Type
Conference Paper
Abstract
The proliferation of hate speech on digital platforms has become a significant issue, and automated content moderation systems built on machine learning are a proposed solution. However, they face challenges in multilingual and low-resource settings due to the need for extensive labelled data. This paper introduces an explainable AI framework designed to identify annotation discrepancies in low-resource languages, focusing on Hindi, the third most-spoken language worldwide, which lacks comprehensive research in hate speech detection. By examining the labelling quality of the Hate speech and Offensive Content Identification in English and Indo-Aryan Languages (HASOC) challenge, we use unsupervised learning methods to extract topical variations and annotation behavior and apply these features in an explainable AI-based classification model, TabNet. We release a relabelled Hindi hate speech benchmark dataset with label-flipping information and related metadata to facilitate research in this area. The source code has also been released for reproducibility purposes. Please be advised that this work contains examples of toxic content
DOI
10.1145/3677117.3685006
Recommended Citation
Sawant, M., Younus, A., Caton, S., & Qureshi, M. A. (2024). Using Explainable AI (XAI) for Identification of Subjectivity in Hate Speech Annotations for Low-Resource Languages. https://doi.org/10.1145/3677117.3685006
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Publication Details
https://doi.org/10.1145/3677117.3685006