Author ORCID Identifier
0000-0001-7658-7264
Document Type
Conference Paper
Disciplines
Computer Sciences, Women's and gender studies
Abstract
Writing style and choice of words used in textual content can vary between men and women both in terms of who the text is talking about and who is writing the text. The focus of this paper is on author gender prediction, identifying the gender of who is writing the text. We compare closed and open vocabulary approaches on different types of textual content including more traditional writing styles such as in books, and more recent writing styles used in user generated content on digital platforms such as blogs and social media messaging. As supervised machine learning approaches can reflect human biases in the data they are trained on, we also consider the gender bias of the different approaches across the different types of dataset. We show that open vocabulary approaches perform better both in terms of prediction performance and with less gender bias.
DOI
https://doi.org/10.1007/978-3-031-26438-2_17
Recommended Citation
Jeyaraj, Manuela N. and Delany, Sarah Jane, "Author Gender Identification Considering Gender Bias" (2023). Conference papers. 25.
https://arrow.tudublin.ie/diraacon/25
Funder
Science Foundation Ireland
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Publication Details
https://link.springer.com/chapter/10.1007/978-3-031-26438-2_17
Conference: Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2022),08/12/2022.Pages: 214-225
doi:10.1007/978-3-031-26438-2_17