Document Type



This item is available under a Creative Commons License for non-commercial use only


Statistics, Computer Sciences

Publication Details

A dissertation submitted in partial fulfilment of the requirements of Technological University Dublin for the degree of M.Sc. in Computer Science (Data Analytics)


In the previous projects, it has been worked to statistically analysis of the factors to impact the score of the subjects of Mathematics and Portuguese for several groups of the student from secondary school from Portugal.

In this project will be interested in finding a model, hypothetically multiple linear regression, to predict the final score, dependent variable G3, of the student according to some features divide into two groups. One group, analyses the features or predictors which impact in the final score more related to the performance of the students, means variables like study time or past failures. The second group analyses the predictors more relate to a family situation or family relationships.

The approach to constructing the linear model is using the principal component results from the analyses of the principal component instead of the original features or predictors.

The linear model proposal is:

score G3 = a + b1*(PC1) + b2*(PC2) + ... + bk*(PCk)

bi = Coefficients

PCi = principal component, i: 1, 2, …, k dimensions

Due that the variables are numeric and categorical, it will be used the extension method called Factor Analysis of Mixed Data (FAMD) to deal with data quantitative and data qualitative.