Articles

Toward Inclusive Online Environments: Counterfactual-Inspired XAI for Detecting and Interpreting Hateful and Offensive Tweets

Muhammad Deedahwar Mazhar Qureshi, Technological University DublinFollow
Muhammad Atif Qureshi, Technological University DublinFollow
Wael Rashwan, Technological University DublinFollow

Author ORCID Identifier

https://orcid.org/0000-0003-4413-4476

Document Type

Conference Paper

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE

Publication Details

World Conference on Explainable Artificial Intelligence Publisher NameSpringer, Cham

https://doi.org/10.1007/978-3-031-44070-0_5

Abstract

The prevalence of hate speech and offensive language on social media platforms such as Twitter has significant consequences, ranging from psychological harm to the polarization of societies. Consequently, social media companies have implemented content moderation measures to curb harmful or discriminatory language. However, a lack of consistency and transparency hinders their ability to achieve desired outcomes. This article evaluates various ML models, including an ensemble, Explainable Boosting Machine (EBM), and Linear Support Vector Classifier (SVC), on a public dataset of 24,792 tweets by T. Davidson, categorizing tweets into three classes: hate, offensive, and neither. The top-performing model achieves a weighted F1-Score of 0.90. Furthermore, this article interprets the output of the best-performing model using LIME and SHAP, elucidating how specific words and phrases within a tweet contextually impact its classification. This analysis helps to shed light on the linguistic aspects of hate and offense. Additionally, we employ LIME to present a suggestive counterfactual approach, proposing no-hate alternatives for a tweet to further explain the influence of word choices in context. Limitations of the study include the potential for biased results due to dataset imbalance, which future research may address by exploring more balanced datasets or leveraging additional features. Ultimately, through these explanations, this work aims to promote digital literacy and foster an inclusive online environment that encourages informed and responsible use of digital technologies.

DOI

https://doi.org/10.1007/978-3-031-44070-0_5

Recommended Citation

Qureshi, M.D.M., Qureshi, M.A., Rashwan, W. (2023). Toward Inclusive Online Environments: Counterfactual-Inspired XAI for Detecting and Interpreting Hateful and Offensive Tweets. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1903. Springer, Cham. DOI: 10.1007/978-3-031-44070-0_5

Funder

Science Foundation Ireland

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Download

Contact the Author

Included in

Computational Engineering Commons

COinS

Articles

Toward Inclusive Online Environments: Counterfactual-Inspired XAI for Detecting and Interpreting Hateful and Offensive Tweets

Author ORCID Identifier

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Search

Browse

Author Corner

Articles

Toward Inclusive Online Environments: Counterfactual-Inspired XAI for Detecting and Interpreting Hateful and Offensive Tweets

Authors

Author ORCID Identifier

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Share

Search

Browse

Author Corner