This item is available under a Creative Commons License for non-commercial use only
Statistics, Computer Sciences
In the previous projects, it has been worked to statistically analysis of the factors to impact the score of the subjects of Mathematics and Portuguese for several groups of the student from secondary school from Portugal.
In this project will be interested in finding a model, hypothetically multiple linear regression, to predict the final score, dependent variable G3, of the student according to some features divide into two groups. One group, analyses the features or predictors which impact in the final score more related to the performance of the students, means variables like study time or past failures. The second group analyses the predictors more relate to a family situation or family relationships.
The approach to constructing the linear model is using the principal component results from the analyses of the principal component instead of the original features or predictors.
The linear model proposal is:
score G3 = a + b1*(PC1) + b2*(PC2) + ... + bk*(PCk)
bi = Coefficients
PCi = principal component, i: 1, 2, …, k dimensions
Due that the variables are numeric and categorical, it will be used the extension method called Factor Analysis of Mixed Data (FAMD) to deal with data quantitative and data qualitative.
Pereira, Nestor, "Factor Analysis of Mixed Data (FAMD) and Multiple Linear Regression in R" (2019). Dissertations. 212.