Document Type



This item is available under a Creative Commons License for non-commercial use only


Computer Sciences

Publication Details

Dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computing (Stream), June 2018.


Text mining is a method for extracting useful information from unstructured data through the identification and exploration of large amounts of text. It is a valuable support tool for organisations. It enables a greater understanding and identification of relevant business insights from text. Critically it identifies connections between information within texts that would otherwise go unnoticed. Its application is prevalent in areas such as marketing and political science however, until recently it has been largely overlooked within economics. Central banks are beginning to investigate the benefits of machine learning, sentiment analysis and natural language processing in light of the large amount of unstructured data available to them. This includes news articles, financial contracts, social media, supervisory and market intelligence and regulatory reports. In this research paper a dataset consisting of regulatory required Solvency and Financial Condition Reports (SFCR) is analysed to determine if machine learning and text classification can assist assessing the completeness of SFCRs. The completeness is determined by whether or not the document adheres to nine European guidelines. Natural language processing and supervised machine learning techniques are implemented to classify pages of the report as belonging to one of the guidelines.