Author ORCID Identifier

0000-0003-4413-4476

Document Type

Conference Paper

Disciplines

Computer Sciences, Information Science

Publication Details

Qureshi, M.A., Younus, A., Caton, S. (2024). Inclusive Counterfactual Generation: Leveraging LLMs in Identifying Online Hate. In: Stefanidis, K., Systä, K., Matera, M., Heil, S., Kondylakis, H., Quintarelli, E. (eds) Web Engineering. ICWE 2024. Lecture Notes in Computer Science, vol 14629. Springer, Cham.

https://doi.org/10.1007/978-3-031-62362-2_3

Abstract

Counterfactually augmented data has recently been proposed as a successful solution for socially situated NLP tasks such as hate speech detection. The chief component within the existing counterfactual data augmentation pipeline, however, involves manually flipping labels and making minimal content edits to training data. In a hate speech context, these forms of editing have been shown to still retain offensive hate speech content. Inspired by the recent success of large language models (LLMs), especially the development of ChatGPT, which have demonstrated improved language comprehension abilities, we propose an inclusivity-oriented approach to automatically generate counterfactually augmented data using LLMs. We show that hate speech detection models trained with LLM-produced counterfactually augmented data can outperform both state-of-the-art and human-based methods.

DOI

https://doi.org/10.1007/978-3-031-62362-2_3

Funder

Science Foundation Ireland

Creative Commons License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.


Share

COinS