Document Type



This item is available under a Creative Commons License for non-commercial use only


1.2 COMPUTER AND INFORMATION SCIENCE, Computer Sciences, Information Science, Bioinformatics

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Technlogical University Dublin for the degree of M.Sc. in Computing (Data Analytics) - 2019-20


This research project seeks to investigate the use of Image Data augmentation that generates synthetic data by adding distortions to original images, as a means of replacement to a large amount of real data used to train the Convolutional Neural Networks. The purpose of the research project is to assess the effectiveness of augmented data over the real data by comparing the performance of the model trained with various amounts of augmented training and validation data ratio. Deep learning tasks involving convolutional neural networks have difficulty in generalizing the models effectively for computer vision tasks when the training dataset is not enough in size. The cause for this is overfitting i.e. network learns the details of training data so much that it starts to negatively impact the performance on unseen data. In order to avoid overfitting, this project incorporates existing methods of transfer learning, deeper networks, dropout layers, with data augmentation. By looking into different ratios of training to validation dataset, this research projects evaluates the impact of increasing training data augmentation and variation in validation dataset size on the performance of a convolutional neural network in a multi-class classification task. The research with a series of trained models, observed that with increasing amount of augmentation, the network performs better in terms of accuracy, loss. Also, if the data is split in training and validation set precisely, better performance was obtained even with smaller dataset.