New Approach Enhances Elderly Speech Recognition Accuracy
Key Takeaways
- 1Proposed LLM-based data augmentation improves ASR for elderly.
- 2Reduction in word error rates by 58.2% using Whisper.
- 3Increases potential for accurate healthcare applications in elderly care.
Recent research addresses challenges in automatic speech recognition specific to the elderly population, which is often hampered by limited training data and unique acoustic properties. This study introduces a novel data augmentation pipeline that utilizes large language models (LLMs) for creating elderly-contextual paraphrased transcripts, which are then synthesized into speech using text-to-speech technology. The augmented dataset enables fine-tuning of the Whisper ASR model, demonstrating notable improvements in performance metrics when applied to English and Korean elderly speech datasets.
Source
arXiv cs.CL (NLP/LLMs)https://arxiv.org/abs/2604.24770
Read original