Author ORCID Identifier

0009-0000-5713-2002

Document Type

Conference Paper

Disciplines

Computer Sciences

Publication Details

International Conference on Text, Speech, and Dialogue(TSD 2025). Lecture Notes in Computer Science(), vol 16029. Springer, Cham.

doi:10.21427/6ftd-vc26

Abstract

Dysarthric speech recognition is essential for enhancing communication and accessibility for individuals with speech impairments, yet its development is hindered by a scarcity of robust, speaker-specific datasets. This study explores low-resource dysarthric speech recognition through cross-speaker transfer using synthetic data and parameter-efficient fine-tuning (PEFT). We integrate SpeechT5 text-to-speech (TTS) synthesis with x-vector speaker embeddings to generate speaker-specific dysarthric speech, enabling model adaptation while preserving pathological speech characteristics such as prosodic irregularities. Experiments on the TORGO dataset show that mixed cross-synthetic data with LoRA fine-tuning achieves a WER of 0.17, representing a 71.7% improvement over the standard model (0.60 WER) without fine-tuning the TTS model. However, crossdataset generalisation remains challenging, yielding higher WERs on MINDS14 (4.69) and AMI (0.96–3.83) datasets. Whilst synthetic data enhances indomain recognition, further research is needed to improve cross-dataset generalisation and speaker adaptation, particularly for low-resource pathological speech settings.

DOI

https://doi.org/10.21427/6ftd-vc26

Funder

Research Ireland

Creative Commons License

Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.


Share

COinS