Document Type

Theses, Ph.D

Disciplines

Computer Sciences

Abstract

Measuring training efficiency for artificial neural networks is an open research problem, current literature reports several attempts to define measures or create reporting frameworks. Current methods lack generality as they require measurements of the hardware or software thus, comparing efficiency between different systems can be difficult. Similarly, current metrics or frameworks generally do not propose the use of the metrics to directly improve training efficiency. This thesis presents three main contributions: (1) a novel framework that quantifies the training efficiency of a neural architecture on a learning task as the average ratio of model accuracy to total energy consumption during training, (2) the definition and analysis of a novel efficiency based stopping criterion for neural network training, and (3) experiments that provide evidence that grokking, which is the sudden increase on the training accuracy of a deep neural network on extremely long training runs, does not alter the dynamics of efficiency. The experimental framework evaluates Convolutional Neural Networks (CNNs) and Bayesian CNNs (BCNNs) across multiple model sizes and convergence conditions on MNIST and CIFAR-10 datasets. Results show that training efficiency declines as training progresses, varies by architecture, and that CNNs generally outperform BCNNs in efficiency, especially as task complexity increases.

DOI

https://doi.org/10.21427/zzyp-0q41

Creative Commons License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.


Share

COinS