Author ORCID Identifier

0000-0002-2661-1892

Document Type

Conference Paper

Disciplines

Statistics

Publication Details

Statistical and Machine Learning: Methods and Applications (SAML-25) on June 5th and 6th, 2025 at TU Dublin, Ireland.

Abstract

Accurate classification of skin lesions is critical for early detection of melanoma and other malignancies, particularly in resource-limited settings. This study presents a novel multi-modal machine learning framework that integrates dermoscopic images and structured clinical metadata to improve diagnostic performance. Leveraging the PAD-UFES-20 dataset, which includes over 2,000 smartphonecaptured lesion images and associated patient metadata, we benchmark a series of unimodal and multimodal models. Our results demonstrate that modality attention fusion (MAF) applied to a frozen SwinV2-Tiny vision transformer and metadata multi-layer perceptron (MLP), augmented with focal loss, yields a state-ofthe- art weighted F1-score of 0.84 and balanced accuracy of 0.802. Ablation studies and external validation on HAM10000 and ISIC 2019 datasets confirm the robustness of the proposed framework. We further introduce a Siamese architecture for contrastive embedding optimisation, which enhances class separability, particularly for underrepresented lesions. Explainability is achieved via SHAP for metadata and Grad-CAM for image features, enabling transparent, clinically aligned decision insights. This framework shows strong potential for deployment in teledermatology and mobile health diagnostics, particularly in underserved regions.

DOI

https://doi.org/10.21427/8pgb-2871

Creative Commons License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.


Share

COinS