Author ORCID Identifier
0000-0003-1208-6991
Document Type
Article
Disciplines
Statistics
Abstract
Clustering is a common unsupervised task in data analysis and machine learning. It deals with finding clusters of objects that are characterized by the highest similarity within the same cluster and the highest dissimilarity between different clusters. One of the most used algorithms in clustering is the popular Partitioning Around Medoids (PAM), also known as k-medoids [4, 5]. The algorithm imposes the center of clusters to be some of the data points, and it looks for a minimal value of the sum of the dissimilarity to all the objects. One of the recognized properties of such a method is its robustness toward outliers due to the minimization of the total dissimilarity to other points. This clustering technique has been designed for linear data. However, observations that cannot be depicted in an Euclidean space but instead in more complex manifolds are often encountered in applications. If so, suitable variations are needed. Particularly, this work extends the PAM algorithm to data consisting of both linear and circular variables, that is, to data lying on the surface of a cylinder. Clustering this kind of data is a quite challenging task due to the complexity of the product space. Cylindrical data are described by a circular and a linear component measured on different scales. The most important difference between the two components is that on a circular scale the data points are periodic, that is, 0∘ and 360∘ represent the same value. On a linear scale instead, the values 0∘ and 360∘ are located in different places. This information must be properly accounted for by any similarity or distance measure: standard clustering methods cannot be applied inherently to these mixed data types. Cylindrical data appear in many fields and have a wide range of applications. For instance, the HSV (Hue, Saturation, Value) color space features in color image processing has a periodic hue component [2]. In environmental sciences, meteorological data often combine wind direction with wind speed, temperature, SO2 concentration, or other air quality indicators [3]. In fire ecology, the fire perimeter orientation can be considered as a two-dimensional or a three-dimensional observation, yielding circular or spherical data. In combination with the fire size, this leads to cylindrical data [1]. The proposed cylindrical PAM method adopts a similarity measure derived from a probabilistic model. The angular component is assumed to be drawn from a von Mises distribution, while the linear components are drawn from a Gaussian distribution. The performance of the proposed cylindrical PAM is evaluated through some numerical and real examples. A comparison is also given to demonstrate its effectiveness in handling mixed data types.
DOI
https://doi.org/10.21427/rsm9-tt41
Recommended Citation
Hammami, Yahia; Demni, Houyem; Messaoud, Amor; and Porzio, Giovanni C., "Partitioning Around Medoids on Product Spaces: A Clustering Approach for Cylindrical Data" (2025). SAML-25 Workshop on Statistical and Machine Learning. 16.
https://arrow.tudublin.ie/saml/16
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Publication Details
Statistical and Machine Learning: Methods and Applications (SAML-25) on June 5th and 6th, 2025 at TU Dublin, Ireland.
doi:10.21427/rsm9-tt41