Author ORCID Identifier
0009-0000-5713-2002
Document Type
Conference Paper
Disciplines
Computer Sciences
Abstract
Dysarthric speech recognition is essential for enhancing communication and accessibility for individuals with speech impairments, yet its development is hindered by a scarcity of robust, speaker-specific datasets. This study explores low-resource dysarthric speech recognition through cross-speaker transfer using synthetic data and parameter-efficient fine-tuning (PEFT). We integrate SpeechT5 text-to-speech (TTS) synthesis with x-vector speaker embeddings to generate speaker-specific dysarthric speech, enabling model adaptation while preserving pathological speech characteristics such as prosodic irregularities. Experiments on the TORGO dataset show that mixed cross-synthetic data with LoRA fine-tuning achieves a WER of 0.17, representing a 71.7% improvement over the standard model (0.60 WER) without fine-tuning the TTS model. However, crossdataset generalisation remains challenging, yielding higher WERs on MINDS14 (4.69) and AMI (0.96–3.83) datasets. Whilst synthetic data enhances indomain recognition, further research is needed to improve cross-dataset generalisation and speaker adaptation, particularly for low-resource pathological speech settings.
DOI
https://doi.org/10.21427/6ftd-vc26
Recommended Citation
Mokgosi, Kesego; Dadgar, Milad; Ennis, Cathy; and Ross, Robert, "Synthesising Cross-Speaker Data for Low-Resource Pathological Speech Recognition with PEFT" (2025). Conference papers. 451.
https://arrow.tudublin.ie/scschcomcon/451
Funder
Research Ireland
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.
Publication Details
International Conference on Text, Speech, and Dialogue(TSD 2025). Lecture Notes in Computer Science(), vol 16029. Springer, Cham.
doi:10.21427/6ftd-vc26