Document Type
Conference Paper
Rights
Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence
Disciplines
Computer Sciences, Information Science
Abstract
Clustering is a fundamental machine learning application, which partitions data into homogeneous groups. K-means and its variants are the most widely used class of clustering algorithms today. However, the original k-means algorithm can only be applied to numeric data. For categorical data, the data has to be converted into numeric data through 1-of-K coding which itself causes many problems. K-prototypes, another clustering algorithm that originates from the k-means algorithm, can handle categorical data by adopting a different notion of distance. In this paper, we systematically compare these two methods through an experimental analysis. Our analysis shows that K-prototypes is more suited when the dataset is large-scaled, while the performance of k-means with 1-of-K coding is more stable. We believe these are useful heuristics for clustering methods working with highly categorical data.
DOI
https://doi.org/10.21427/em6q-2787
Recommended Citation
Wang, F., Franco, H., Pugh, J. and Ross, R. (2016) Empirical Comparative Analysis of 1-of-K Coding and K-Prototypes in Categorical Clustering. Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2016), September 20-21 2016, University College Dublin. doi:10.21427/em6q-2787
Funder
CeADAR
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Publication Details
Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2016), September 20-21 2016, University College Dublin