This item is available under a Creative Commons License for non-commercial use only
This research project seeks to investigate some of the different sampling techniques that generate and use synthetic data to oversample the minority class as a means of handling the imbalanced distribution between non-fraudulent (majority class) and fraudulent (minority class) classes in a credit-card fraud dataset. The purpose of the research project is to assess the effectiveness of these techniques in the context of fraud detection which is a highly imbalanced and cost-sensitive dataset. Machine learning tasks that require learning from datasets that are highly unbalanced have difficulty learning since many of the traditional learning algorithms are not designed to cope with large differentials between classes. For that reason, various different methods have been developed to help tackle this problem. Oversampling and undersampling are examples of techniques that help deal with the class imbalance problem through sampling. This paper will evaluate oversampling techniques that use synthetic data to balance the minority class. The idea of using synthetic data to compensate for the minority class was first proposed by (Chawla et al., 2002). The technique is known as Synthetic Minority Over-Sampling Technique (SMOTE). Following the development of the technique, other techniques were developed from it. This paper will evaluate the SMOTE technique along with other also popular SMOTE-based extensions of the original technique.
Parkinson de Castro, E. (2020). An examination of the smote and other smote-based techniques that use synthetic data to oversample the minority class in the context of credit-card fraud classification. Masters Dissertation. Technological University Dublin. DOI:10.21427/wj33-n221