Conference papers

Automatic Speech Recognition Models for Pathological Speech: Challenges and Insights

Kesego Mokgosi, Technological University DublinFollow
Cathy Ennis, Technological University DublinFollow
Robert Ross, Technological University DublinFollow

Author ORCID Identifier

0009-0000-5713-2002

Document Type

Conference Paper

Disciplines

1.2 COMPUTER AND INFORMATION SCIENCE, Computer Sciences, Information Science

Publication Details

https://aics2024.ucd.ie/

doi:10.21427/5r8h-px25

Abstract

Conversational avatars provide innovative platforms for enhancing therapist-patient interactions in speech therapy by offering real-time feedback. However, the performance of Automatic Speech Recognition (ASR) models on disordered speech, such as dysarthria and stuttering, remains underexplored. The effectiveness of these systems hinges on the accuracy and processing speed of ASR models when transcribing pathological speech, particularly in real-time scenarios. This study evaluates several pre-trained ASR models, including Whisper-large- v3-turbo, Canary, DistilWhisper, and NVIDIA’s stt-en-fastconformer-ctc-large across three datasets: Common Voice (standard speech), TORGO (dysarthric speech), and UCLASS (stuttered speech). We assess the models using Word Error Rate (WER), Real-Time Factor (RTF), and BERTScore to measure transcription accuracy, computational efficiency, and semantic congruence. The stt-en-fastconformer-ctc-large model demonstrates the fastest processing speeds, achieving the lowest WER and highest BERTScores on both the Common Voice and TORGO datasets, making it highly suitable for real-time therapeutic applications. However, all models struggle with accurately transcribing stuttered speech from the UCLASS dataset. These results highlight the need for ASR improvements for disordered speech, focusing on edge deployment to reduce latency and enhance accuracy with multimodal inputs.

DOI

https://doi.org/10.21427/5r8h-px25

Recommended Citation

Mokgosi, Kesego; Ennis, Cathy; and Ross, Robert, "Automatic Speech Recognition Models for Pathological Speech: Challenges and Insights" (2024). Conference papers. 437.
https://arrow.tudublin.ie/scschcomcon/437

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 4.0 International License.

Download

Contact the Author

COinS

Conference papers

Automatic Speech Recognition Models for Pathological Speech: Challenges and Insights

Author ORCID Identifier

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Creative Commons License

Search

Browse

Author Corner

Links

Conference papers

Automatic Speech Recognition Models for Pathological Speech: Challenges and Insights

Authors

Author ORCID Identifier

Document Type

Disciplines

Publication Details

Abstract

DOI

Recommended Citation

Creative Commons License

Share

Search

Browse

Author Corner

Links