Download Full Text (512 KB)
Technological University Dublin
Gendered language refers to the use of words that indicate the gender of an individual. It can be explicit, where the gender is directly implied by the specific words used (e.g., mother, she, man), or it can be implicit, where societal roles and behaviors convey a person's gender. For example, expectations that women display communal traits (e.g., affectionate, caring, gentle) and men display agentic traits (e.g., assertive, competitive, decisive). The presence of gendered language in natural language processing (NLP) systems can reinforce gender stereotypes and bias. Our work introduces an approach to creating gendered language datasets using ChatGPT. These datasets are designed to support data-driven methods for identifying gender stereotypes and mitigating gender bias. The approach focuses on generating implicit gendered language that captures and reflects stereotypical characteristics or traits associated with a specific gender. This is achieved by constructing prompts for ChatGPT that incorporate gender-coded words sourced from gender-coded lexicons. The evaluation of the datasets generated demonstrates good examples of English-language gendered sentences that can be categorized as either contradictory to or consistent with gender stereotypes. Additionally, the generated data exhibits a strong gender bias.
gendered language, gendered language dataset, gender stereotype detection, gender bias mitigation, natural language processing, chatgpt, implicit gender stereotypes
Computer Sciences | Other Feminist, Gender, and Sexuality Studies
First Annual Teaching and Research Showcase 2023
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Soundararajan, S., & Delany, S. J. (2023). Identifying Gendered Language. Technological University Dublin. DOI: 10.21427/Z40X-FD49