Author ORCID Identifier
Gendered language is the use of words that denote an individual’s gender. This can be explicit where the gender is evident in the actual word used, e.g. mother, she, man, but it can also be implicit where social roles or behaviours can signal an individual’s gender - for example, expectations that women display communal traits (e.g., affectionate, caring, gentle) and men display agentic traits (e.g., assertive, competitive, decisive). The use of gendered language in NLP systems can perpetuate gender stereotypes and bias. This paper proposes an approach to generating gendered language datasets using ChatGPT which will provide data for data-driven approaches for gender stereotype detection and gender bias mitigation. The approach focuses on generating implicit gendered language that captures and reflects stereotypical characteristics or traits of a particular gender. This is done by engineering prompts to ChatGPT that use gender-coded words from gender-coded lexicons. The evaluation of the datasets generated shows good instances of English- language gendered sentences that can be identified as those that are consistent with gender stereotypes and those that are contradictory. The generated data also shows strong gender bias.
Soundararajan, S. (2023). Using ChatGPT to generate Gendered Language [Data set]. Technological University Dublin. DOI: 10.21427/AYM7-SF29
Technological University Dublin
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.