Document Type

Dissertation

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE, Computer Sciences

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computer Science (Data Science)

Abstract

Given that women are under-represented in medical datasets, and that machine learning classification algorithms are known to exhibit bias towards the majority class, the growing application of machine learning in the medical field risks resulting in worse medical outcomes for female patients. The Heart Failure Prediction (HFP) dataset is a historical dataset used for the training of models for the prediction of heart disease. This dataset contains significantly fewer female patients than male patients, and as such it is expected that models trained using this data will inherit a gender bias to favour male patients. This dissertation explores the use of different data re-sampling techniques (SMOTE, SMOTE-NC, SVM-SMOTE, Borderline-SMOTE, ADASYN, ROS, Near Miss, and RUS) for their ability to correct for the under-representation of female patients, and the use of these synthetic balanced datasets to reduce the bias observed in classification models trained with this data.

Creative Commons License

Creative Commons Attribution-Share Alike 4.0 International License
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.


Share

COinS