Predicting Long Term Unemployment: an Exploration Using Customer Financial Data
Document Type Dissertation
Dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computing (Stream), March 2018.
The employment status of a customer is of particular interest to financial institutions such as banks, where the loss of income due to a job loss event can substantially increase the probability of mortgage default. This dissertation describes an attempt to build a model that estimates a bank’s customer’s probability of regaining employment once an unemployment event has occurred. The current literature on state of the art job seeker profiling was examined to determine socio-demographic variables considered important in long term unemployment detection. A key aim of the project was to utilise a large amount of customer financial data as a potential long-term unemployment predictor. To that end economic papers were reviewed that explored such concepts as the permanent income hypothesis, precautionary saving, unemployment scarring and wealth position as potential predictors of long term unemployment. A novel method of detecting an unemployment event was developed; a severe drop off in current account turnover in combination with subsequent regular social welfare transactions was used to create and track a base of unemployed customers over time. Two models were built using the customer base – the first using customer sociodemographic data as predictor variables, the second using the same variables augmented with up to date financial data. A number of different sampling techniques were also applied to address a slightly unbalanced dataset. The results of the models were compared and the evaluation criteria showed that the addition of the financial variables resulted in a slight increase in the predictive power of the models, showing the merits of leveraging information on an individual’s financial position for the prediction of future employment status. The study also found that a random forest ensemble modelling approach yielded good results when compared with a logistic regression model, which is the model type usually used in the area of unemployment predication.