Document Type



Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence


Computer Sciences

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computer Science (Data Science), 2021.


Sentiment analysis is also known as Opinion mining or emotional mining which aims to identify the way in which sentiments are expressed in text and written data. Sentiment analysis combines different study areas such as Natural Language Processing (NLP), Data Mining, and Text Mining, and is quickly becoming a key concern for businesses and organizations, especially as online commerce data is being used for analysis. Twitter is also becoming a popular microblogging and social networking platform today for information among people as they contribute their opinions, thoughts, and attitudes on social media platforms over the years. Because of the large database created by twitter stock market sentiment analysis has always been the subject of interest for various researchers, investors, and scientists due to its highly unpredictable nature.

Sentiment analysis can be performed in different ways, but the focus of this study is to perform sentiment analysis using the transformer-based pre-trained models such as BERT(bi-directional Encoder Representations from Transformers) and XLNet which is a Generalised autoregressive model with fewer training instances using Mixout regularization as the traditional machine and deep learning models such as Random Forest, Naïve Bayes, Recurrent Neural Network (RNN), Long short-term memory (LSTM) because fails when given fewer training instances and it required intense feature engineering and processing of textual data. The objective of this research is to study and understand the performance of BERT and XLNet with fewer training instances using the Mixout regularization for stock market sentiment analysis. The proposed model resulted in improved performance in terms of accuracy, precision, recall and f1-score for both the BERT and XLNet models using mixout regularization when given adequate and under-sampled data.