Document Type
Conference Paper
Rights
Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence
Disciplines
Computer Sciences
Abstract
This paper proposes a novel machine learning procedure for genome-wide association study (GWAS), named LightGWAS. It is based on the LightGBM framework, in addition to being a single, resilient, autonomous and scalable solution to address common limitations of GWAS implementations found in the literature. These include reliance on massive manual quality control steps and specific GWAS methods for each type of dataset morphology and size. Through this research, LightGWAS has been contrasted against PLINK2, one of the current state-of-the-art for GWAS implementations based on general linear model with support to firth regularisation. The mean differences measured upon standard classification metrics, extracted via quantitative empirical tests through k-fold cross-validation technique, indicated that LightGWAS outperforms PLINK2 for balanced, imbalanced, and high-imbalanced genomic datasets. Paired difference tests denoted statistical significance in the results extracted from the experiments with imbalanced datasets. This article contributes to the body of knowledge by presenting a potentially more efficient GWAS procedure based on nonparametric approaches. LightGWAS ensures adaptability with higher precision in the discovery of causal single-nucleotide polymorphisms, thanks to the leaf-wise tree growth algorithm offered by the state-of-the-art for gradient boosting decision trees. Control for false-positives and statistical power are automatically addressed by the model’s training process, which significative reduces human dependency during the study design.
DOI
http://dx.doi.org/10.6084/m9.figshare.13483341.v1
Recommended Citation
Bruno Ambrozio, Luca Longo, Lucas Rizzo. LightGWAS: A Novel Machine Learning Procedure for Genome-Wide Association Study, Proceedings for the 28th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, Dublin, Ireland, December 7-8, 2020, V. 2271, pp. 25-36, DOI: 10.6084/m9.figshare.13483341.v1
Publication Details
28th Irish Conference on Artificial Intelligence and Cognitive Science - AICS2020, Dublin, Ireland