Document Type



This item is available under a Creative Commons License for non-commercial use only


Computer Sciences

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computer Science (Data Analytics)


This research project seeks to investigate some of the different sampling techniques that generate and use synthetic data to oversample the minority class as a means of handling the imbalanced distribution between non-fraudulent (majority class) and fraudulent (minority class) classes in a credit-card fraud dataset. The purpose of the research project is to assess the effectiveness of these techniques in the context of fraud detection which is a highly imbalanced and cost-sensitive dataset. Machine learning tasks that require learning from datasets that are highly unbalanced have difficulty learning since many of the traditional learning algorithms are not designed to cope with large differentials between classes. For that reason, various different methods have been developed to help tackle this problem. Oversampling and undersampling are examples of techniques that help deal with the class imbalance problem through sampling. This paper will evaluate oversampling techniques that use synthetic data to balance the minority class. The idea of using synthetic data to compensate for the minority class was first proposed by (Chawla et al., 2002). The technique is known as Synthetic Minority Over-Sampling Technique (SMOTE). Following the development of the technique, other techniques were developed from it. This paper will evaluate the SMOTE technique along with other also popular SMOTE-based extensions of the original technique.