Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence
Computer Sciences, Information Science
Silhouette is one of the most popular and effective internal measures for the evaluation of clustering validity. Simplified Silhouette is a computationally simplified version of Silhouette. However, to date Simplified Silhouette has not been systematically analysed in a specific clustering algorithm. This paper analyses the application of Simplified Silhouette to the evaluation of k-means clustering validity and compares it with the k-means Cost Function and the original Silhouette from both theoretical and empirical perspectives. The theoretical analysis shows that Simplified Silhouette has a mathematical relationship with both the k-means Cost Function and the original Silhouette, while empirically, we show that it has comparative performances with the original Silhouette, but is much faster in calculation. Based on our analysis, we conclude that for a given dataset the k-means Cost Function is still the most valid and efficient measure in the evaluation of the validity of k-means clustering with the same k value, but that Simplified Silhouette is more suitable than the original Silhouette in the selection of the best result from k-means clustering with different k values.
Franco-Penya, H. et al. (2017) An Analysis of the Application of Simplified Silhouette to the Evaluation of k-means Clustering Validity. 13th International Conference on Machine Learning and Data Mining MLDM 2017, July 15-20, 2017, New York, USA.