Author ORCID Identifier

https://orcid.org/0009-0005-2171-5501

Document Type

Conference Paper

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE, Computer Sciences, Women's and gender studies

Publication Details

https://aclanthology.org/2024.icnlsp-1.42/

doi:10.21427/qxhg-9f41

Abstract

Large Language Models (LLMs) have swiftly become essential tools across diverse text generation applications. However, LLMs also raise significant ethical and societal concerns, particularly regarding potential gender biases in the text they produce. This study investigates the presence of gender bias in four LLMs: ChatGPT 3.5, ChatGPT 4, Llama 2 7B, and Llama 2 13B. By generating a gendered language dataset using these LLMs, focusing on sentences about men and women, we analyze the extent of gender bias in their outputs. Our evaluation is two-fold: we use the generated dataset to train a gender stereotype detection task and measure gender bias in the classifier, and we perform a comprehensive analysis of the LLM-generated text at both the sentence and word levels. Gender bias evaluations in classification tasks and lexical content reveal that all the LLMs demonstrate significant gender bias. ChatGPT 4 and Llama 2 13B exhibit the least gender bias, with weak associations between gendered adjectives used and the gender of the person described in the sentence. In contrast, ChatGPT 3.5 and Llama 2 7B exhibit the most gender bias, showing strong associations between the gendered adjectives used and the gender of the person described in the sentence.

DOI

https://doi.org/10.21427/qxhg-9f41

Funder

Technological University Dublin

Creative Commons License

Creative Commons Attribution-Share Alike 4.0 International License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.


Share

COinS