Document Type

Dissertation

Rights

This item is available under a Creative Commons License for non-commercial use only

Disciplines

Computer Sciences

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computer Science (Data Analytics)

Abstract

Machine learning approaches are applied across several domains to either simplify or automate tasks which directly result in saved time or cost. Text document labelling is one such task that requires immense human knowledge about the domain and efforts to review, understand and label the documents. The company Stare Decisis summarises legal judgements and labels them as they are made available on Irish public legal source www.courts.ie. This research presents a recommendation-based approach to reduce the time for solicitors at Stare Decisis by reducing many numbers of available labels to pick from to a concentrated few that potentially contains the relevant label for a given judgement. To solve this problem, traditional and state-of-the-art text feature representations along with K-Nearest Neighbour recommender using both cosine similarity and word mover's distance are developed and compared. A series of experiments are designed starting from TF vectors and KNN recommender which is set as a baseline. Further experiments were designed after observing the results of the current experiment. Pre-trained word2vec was used in this experiment as a baseline for state-of-the-art approaches and domain specific embeddings were developed using data scraped from legal text sources.

DOI

https://doi.org/10.21427/jja5-4004

Share

COinS