Document Type
Theses, Ph.D
Disciplines
Computer Sciences
Abstract
Measuring training efficiency for artificial neural networks is an open research problem, current literature reports several attempts to define measures or create reporting frameworks. Current methods lack generality as they require measurements of the hardware or software thus, comparing efficiency between different systems can be difficult. Similarly, current metrics or frameworks generally do not propose the use of the metrics to directly improve training efficiency. This thesis presents three main contributions: (1) a novel framework that quantifies the training efficiency of a neural architecture on a learning task as the average ratio of model accuracy to total energy consumption during training, (2) the definition and analysis of a novel efficiency based stopping criterion for neural network training, and (3) experiments that provide evidence that grokking, which is the sudden increase on the training accuracy of a deep neural network on extremely long training runs, does not alter the dynamics of efficiency. The experimental framework evaluates Convolutional Neural Networks (CNNs) and Bayesian CNNs (BCNNs) across multiple model sizes and convergence conditions on MNIST and CIFAR-10 datasets. Results show that training efficiency declines as training progresses, varies by architecture, and that CNNs generally outperform BCNNs in efficiency, especially as task complexity increases.
DOI
https://doi.org/10.21427/zzyp-0q41
Recommended Citation
Cueto Mendoza, Eduardo, "A Unified Framework for Evaluating Training Efficiency in Deep (Bayesian) Neural Networks: Metrics, Overtraining, Stopping Criteria, and Grokking Computer Science" (2026). Doctoral. 6.
https://arrow.tudublin.ie/compdidadoc/6
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.