Document Type

Conference Paper

Disciplines

Statistics

Publication Details

Statistical and Machine Learning: Methods and Applications (SAML-25) on June 5th and 6th, 2025 at TU Dublin, Ireland.

doi:10.21427/3n25-1h53

Abstract

This Master’s thesis addresses early identification of first-year Computer Science students at risk of underperformance by comparing inherently interpretable (“glass-box”) predictive models with the existing Naïve Bayes–based PreSS tool. The PreSS dataset was originally compiled by Quille & Bergin from 692 first-year CS1 students across eleven institutions in Ireland and Denmark, who completed surveys on programming and mathematics backgrounds, gaming habits and a short programming test four to six hours into the course. Seventeen normalized features capturing demographic, academic and behavioural factors were extracted. In this thesis, four machine learning models are evaluated: Naïve Bayes, explainable boosting machines, automatic piecewise linear regression, and decision trees. Model performance is assessed with particular emphasis on recall to minimise the number of undetected at-risk students. Four types of explainable AI outputs (local feature importance, similar and contrastive cases, decision paths, and similar and contrastive cases) are used in a parallel survey among computer-science educators in Ireland and the Netherlands to explore educators’ confidence in model outputs.

DOI

https://doi.org/10.21427/3n25-1h53

Creative Commons License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.


Share

COinS