Document Type



This item is available under a Creative Commons License for non-commercial use only


Computer Sciences

Publication Details

Masters Thesis submitted to the School of Computing, Technological University Dublin for the Award of Msc. Computing (Information Technology).


As the interest in machine learning and data mining springs up, the problem of how to assess learning algorithms and compare classifiers become more pressing. This has been associated with the lack of comprehensive and complete workflow depending on the project scale to provide guidance to its users. This means the success or failure of the project can be highly dependent on the person or team carrying it.

The standard practice adopted by many researchers and experimenters has been to follow steps or phases from existing workflows such as CRISP-DM, KDD and SASSEMMA. However, as machine learning and data mining fields involve complex comparative experiments, there is a need of having complete workflow which when applied provides efficient and effective results. Though existing workflows offers many benefits, a successful comparative experiment requires more than outlined steps of workflows. Conclusions based on results drawn from a more complete workflow will yield more reliable results and experimenter can stand with confidence while comparing classifiers.

This dissertation focuses on a range of issues from machine learning to statistics for the development of the classifier workflow. It represents in detail background materials which are the key to understanding how different experiments have to be carried out. It explains how different classification techniques work and their applications in different areas. It also explains how classification evaluations can be used in different domains. It also determines when an experimenter should use performance measures and how these measures correspond to performance estimators. Moreover, it explains how different settings can be obtained before committing to the experimentation step. Finally, a complete eight-phase classifier workflow which is platform independent will be provided. The workflow was then evaluated by expert users using close ended questionnaire.