Author ORCID Identifier

https://orcid.org/0000-0002-2768-2676

Document Type

Conference Paper

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

Computer Sciences, Sociology, Women's and gender studies

Publication Details

_{^{Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2022)}}

https://link.springer.com/book/9783031264399

Abstract

Predictions from machine learning models can reflect biases in the data on which they are trained. Gender bias has been identified in natural language processing systems such as those used for recruitment. The development of approaches to mitigate gender bias in training data typically need to be able to isolate the effect of gender on the output to see the impact of gender. While it is possible to isolate and identify gender for some types of training data, e.g. CVs in recruitment, for most textual corpora there is no obvious gender label. This paper proposes a general approach to measure bias in textual training data for NLP prediction systems by providing a gender label identified from the textual content of the training data. The approach is compared with the identity term template approach currently in use, also known as Gender Bias Evaluation Datasets (GBETs), which involves the design of synthetic test datasets which isolate gender and are used to probe for gender bias in a dataset. We show that our Identity Term Sampling (ITS) approach is capable of identifying gender bias at least as well as identity term templates and can be used on training data that has no obvious gender label.

DOI

https://doi.org/10.21427/BKM6-RF06

Recommended Citation

Sobhani, N., & Delany, S. J. (2022). Identity Term Sampling for Measuring Gender Bias in Training Data. Springer Nature. DOI: 10.21427/BKM6-RF06

Funder

Science Foundation Ireland

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Download

Contact the Author

Included in

Artificial Intelligence and Robotics Commons, Data Science Commons

COinS

Conference Papers

Identity Term Sampling for Measuring Gender Bias in Training Data

Author ORCID Identifier

Document Type

Rights

Disciplines

Publication Details

_{^{Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2022)}}

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Search

Browse

Author Corner

Conference Papers

Identity Term Sampling for Measuring Gender Bias in Training Data

Authors

Author ORCID Identifier

Document Type

Rights

Disciplines

Publication Details

Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2022)

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Share

Search

Browse

Author Corner

_{^{Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2022)}}