Conference papers

Empirical Comparative Analysis of 1-of-K Coding and K-Prototypes in Categorical Clustering

Document Type

Conference Paper

Rights

Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence

Disciplines

Computer Sciences, Information Science

Publication Details

Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2016), September 20-21 2016, University College Dublin

Abstract

Clustering is a fundamental machine learning application, which partitions data into homogeneous groups. K-means and its variants are the most widely used class of clustering algorithms today. However, the original k-means algorithm can only be applied to numeric data. For categorical data, the data has to be converted into numeric data through 1-of-K coding which itself causes many problems. K-prototypes, another clustering algorithm that originates from the k-means algorithm, can handle categorical data by adopting a different notion of distance. In this paper, we systematically compare these two methods through an experimental analysis. Our analysis shows that K-prototypes is more suited when the dataset is large-scaled, while the performance of k-means with 1-of-K coding is more stable. We believe these are useful heuristics for clustering methods working with highly categorical data.

DOI

https://doi.org/10.21427/em6q-2787

Recommended Citation

Wang, F., Franco, H., Pugh, J. and Ross, R. (2016) Empirical Comparative Analysis of 1-of-K Coding and K-Prototypes in Categorical Clustering. Irish Conference on Artificial Intelligence and Cognitive Science (AICS 2016), September 20-21 2016, University College Dublin. doi:10.21427/em6q-2787

Funder

CeADAR

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Download

Included in

Other Computer Engineering Commons

COinS

Conference papers

Empirical Comparative Analysis of 1-of-K Coding and K-Prototypes in Categorical Clustering

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Search

Browse

Author Corner

Links

Conference papers

Empirical Comparative Analysis of 1-of-K Coding and K-Prototypes in Categorical Clustering

Authors

Document Type

Rights

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Funder

Creative Commons License

Included in

Share

Search

Browse

Author Corner

Links