Document Type



Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence


Computer Sciences

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computing (Data Science Stream), 2021.


A network intrusion detection system (NIDS) is one important element to mitigate cybersecurity risks, the NIDS allow for detecting anomalies in a network which may be a cyberattack to a corporate network environment. A NIDS can be seen as a classification problem where the ultimate goal is to distinguish between malicious traffic among a majority of benign traffic. Researches on NIDS are often performed using outdated datasets that don’t represent the actual cyberspace. Datasets such as the CICIDS2018 address this gap by being generated from attacks and an infrastructure that reflects an up-to-date scenario.

A problem may arise when machine learning classification algorithms are trained on a dataset that presents class imbalance towards a majority, which is the case of CICIDS2018 data where the majority class is skewed to legitimate traffic. Such problem can be tackled by modifying a dataset probability distribution by augmenting the existing data to achieve balance in the dataset. Many different methods can be used to do so, ranging from naive approaches like random oversampling or undersampling; Machine learning with SMOTE and Decision Trees; Or even sophisticated deep learning models such as the GAN and CTGAN.

An evaluation of the different data-augmentation methods for training a random forest classifier task showed that ROS and SMOTE are competitive in detecting attacks, while CTGAN demonstrated to better recognize benign samples and provide a balance between security and functionality for the network, however at a computational resource expense.