Large Language Model Data Generation for Enhanced Intent Recognition in German Speech
Theresa Pekarek Rosin, Burak Can Kaplan, Stefan Wermter

TL;DR
This paper presents a novel method for improving intent recognition in elderly German speech by combining adapted ASR models with synthetic data generated by large language models, enhancing robustness and performance.
Contribution
It introduces a new approach that leverages LLM-generated synthetic data and fine-tuned ASR models for better German speech intent recognition, especially for low-resource elderly speech datasets.
Findings
Synthetic LLM-generated data improves classification accuracy.
LeoLM outperforms larger models like ChatGPT in dataset quality.
The approach enhances robustness to different speaking styles and vocabulary.
Abstract
Intent recognition (IR) for speech commands is essential for artificial intelligence (AI) assistant systems; however, most existing approaches are limited to short commands and are predominantly developed for English. This paper addresses these limitations by focusing on IR from speech by elderly German speakers. We propose a novel approach that combines an adapted Whisper ASR model, fine-tuned on elderly German speech (SVC-de), with Transformer-based language models trained on synthetic text datasets generated by three well-known large language models (LLMs): LeoLM, Llama3, and ChatGPT. To evaluate the robustness of our approach, we generate synthetic speech with a text-to-speech model and conduct extensive cross-dataset testing. Our results show that synthetic LLM-generated data significantly boosts classification performance and robustness to different speaking styles and unseen…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
