Document Type



This item is available under a Creative Commons License for non-commercial use only


Computer Sciences

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computing (Data Analytics), 2017.


A job title is an all-encompassing very short form description that conveys all of the pertinent information relating to a job. The job title typically encapsulates - and should encapsulate - the domain, role and level of responsibility of any given job. Significant value is attached to job titles both internally within organisational structures and to individual job holders. Organisations map out all employees in an organogram on the basis of job titles. This has a bearing on issues like salary, level and scale of responsibility, employee selection and so on. Employees draw value from their own job titles as a means of self-identity and this can have a significant impact on their engagement and motivation. Classification of job titles based upon the details of the job is a subjective human resources exercise, however, which risks bias and inconsistency. I am instead proposing that the job title classification process can be performed in a systematic, algorithmic- based process with the application of standard Natural Language Processing (NLP) together with supervised machine learning. In this paper, data (job descriptions) labelled with Job Titles was collected from a popular national job postings website ( The data went through several standard text-pre-processing transformations which are detailed below, in or- der to reduce dimensionality of the corpus of data. Feature engineering was used to create a Data Model(s) of selected keyword sets characteristic to each Job Title gen- erated on the basis of term frequency. The models developed with the Random Forest and Support Vector Machines supervised learning algorithms were used to generate prediction models to make predictions based on the Top 30 most frequently occurring Job Titles. The most successful model was the SVM linear kernel based model, which had an Accuracy rate of 71%, Macro Average Precision of 70%, Macro Averaged Recall of 67% and a Macro Average F-Score of 66%. The Random Forest Model performed less well; with a Accuracy rate of 58%, Macro Average Precision of 56%, Macro Average Recall of 55% and Macro Average FScore of 56%. The data model described here and the prediction performance obtained indicate that several particularities of the problem its high dimensionality and the complexity of feature engineering required to generate a data model with the correct keywords for each job lead to data models that cannot provide an optimal performance even when using powerful Machine Learning (ML) algorithms. The data model design can be improved using a wider data set (completed from job descriptions collected from a variety of websites) thus optimising the set of keywords describing each job title. More complex and computationally expensive algorithms - based on deep learning - may also provide more refined and more accurate predictive models. No research was found during this study which examined the subject matter of classification of job titles using machine learning specifically. However, other relevant literature was reviewed on text classification via supervised learning which was useful in designing the models and applied to this domain. While supervised ML techniques are commonly applied to text classification includ- ing sentiment analysis, there was no similar study described in the literature approach- ing the link between job titles and the corresponding required skills. Nevertheless, the work presented here describes a valid and practical approach to answering the pro- posed research question within the constraints of a limited data model and basic ML algorithms. Such an approach may prove a working base for designing future models for artificial intelligence applications.