Author ORCID Identifier

0000-0001-7658-7264

Document Type

Conference Paper

Disciplines

Computer Sciences, *human – machine relations, Women's and gender studies, Social issues, Ethics

Publication Details

Conference: 15th Internation Conference on Recent Advances in Natural Language Processing (RANLP 2025)

Pages: 481-490

Venue: Varna, Bulgaria

Date: 8-10 Sep, 2025

Abstract

AI models learn gender-stereotypical language from human data. So, understanding how well different explanation techniques capture diverse language features that suggest gender stereotypes in text can be useful in identifying stereotypes that could potentially lead to gender bias. The influential words identified by four explanation techniques (LIME, SHAP, Integrated Gradients (IG) and Attention) in a gender stereotype detection task were compared with words annotated by human evaluators. All techniques emphasized adjectives and verbs related to characteristic traits and gender roles as the most influential words. LIME was best at detecting explicitly gendered words, while SHAP, IG and Attention showed stronger overall alignment and considerable overlap. A combination of these techniques, combining the strengths of model-agnostic and modelspecific explanations, performs better at capturing gender-stereotypical language. Extending to hate speech and sentiment prediction tasks, annotator agreement suggests these tasks to be more subjective while explanation techniques can better capture explicit markers in hate speech than the more nuanced gender stereotypes. This research highlights the strengths of different explanation techniques in capturing subjective gender stereotype language in text.

DOI

https://doi.org/10.26615/978-954-452-098-4-057

Recommended Citation

Nayantara, Manuela and Delany, Sarah Jane, "Detecting Gender Stereotypical Language using Model-agnostic and Model-specific Explanations" (2025). Conference Papers. 1.
https://arrow.tudublin.ie/mllcon/1

Funder

SFI Centre in Research Training in Machine Learning (ML-Labs)

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Download

Available for download on Thursday, December 02, 2027

Contact the Author

COinS

Conference Papers

Detecting Gender Stereotypical Language using Model-agnostic and Model-specific Explanations

Author ORCID Identifier

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Search

Browse

Author Corner

Conference Papers

Detecting Gender Stereotypical Language using Model-agnostic and Model-specific Explanations

Authors

Author ORCID Identifier

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Share

Search

Browse

Author Corner