Author ORCID Identifier
0000-0003-4413-4476
Document Type
Conference Paper
Disciplines
Computer Sciences, Information Science
Abstract
Counterfactually augmented data has recently been proposed as a successful solution for socially situated NLP tasks such as hate speech detection. The chief component within the existing counterfactual data augmentation pipeline, however, involves manually flipping labels and making minimal content edits to training data. In a hate speech context, these forms of editing have been shown to still retain offensive hate speech content. Inspired by the recent success of large language models (LLMs), especially the development of ChatGPT, which have demonstrated improved language comprehension abilities, we propose an inclusivity-oriented approach to automatically generate counterfactually augmented data using LLMs. We show that hate speech detection models trained with LLM-produced counterfactually augmented data can outperform both state-of-the-art and human-based methods.
DOI
https://doi.org/10.1007/978-3-031-62362-2_3
Recommended Citation
Qureshi, M. Atif; Younus, Arjumand; and Caton, Simon, "Inclusive Counterfactual Generation: Leveraging LLMs in Identifying Online Hate" (2024). Articles. 242.
https://arrow.tudublin.ie/creaart/242
Funder
Science Foundation Ireland
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Publication Details
Qureshi, M.A., Younus, A., Caton, S. (2024). Inclusive Counterfactual Generation: Leveraging LLMs in Identifying Online Hate. In: Stefanidis, K., Systä, K., Matera, M., Heil, S., Kondylakis, H., Quintarelli, E. (eds) Web Engineering. ICWE 2024. Lecture Notes in Computer Science, vol 14629. Springer, Cham.
https://doi.org/10.1007/978-3-031-62362-2_3