Document Type



This item is available under a Creative Commons License for non-commercial use only



Publication Details

A dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computing 9Data Analytics), January 2019.


Identification of retainable customers is very essential for the functioning and growth of any business. An effective identification of retainable customers can help the business to identify the reasons of retention and plan their marketing strategies accordingly. This research is aimed at developing a machine learning model that can precisely predict the retainable customers from the total customer data of an e-learning business. Building predictive models that can efficiently classify imbalanced data is a major challenge in data mining and machine learning. Most of the machine learning algorithms deliver a suboptimal performance when introduced to an imbalanced dataset. A variety of algorithm level (cost sensitive learning, one class learning, ensemble methods ) and data level methods (sampling, feature selection) are widely used to address the class imbalance in the retention prediction problems. This research employs a quantitative and inductive approach to build a supervised machine learning model that addresses the class imbalance problem and efficiently predict the customer retention. The retention Precision is used as the evaluation metrics for this research. The research evaluates the performance of different sampling methods (Random Under – Sampling, Random Over – Sampling, SMOTE) on different single and ensemble machine learning models. The results show that Random Under-Sampling used along with XGBoost classifier yields the best precision in identifying the retention class. The best model evolved in the research was also used to predict retainable customers from the recent unknown customer data, and could attain a retention precision of 57.5%.