A Data-Driven Investigation of Noise-Adaptive Utterance Generation with Linguistic Modification
Anupama Chingacham, Vera Demberg, Dietrich Klakow

TL;DR
This paper investigates how linguistic modifications in paraphrases can improve speech intelligibility in noisy environments, demonstrating a data-driven approach with a new ranking model that enhances understanding under challenging noise conditions.
Contribution
It introduces a dataset of paraphrases in noise, analyzes noise-robust cues, and proposes an intelligibility-aware ranking model that outperforms baselines.
Findings
Careful paraphrase selection improves intelligibility by 33% at SNR -5 dB.
Intelligibility differences are mainly driven by noise-robust acoustic cues.
The proposed ranking model outperforms baselines with a 31.37% relative improvement.
Abstract
In noisy environments, speech can be hard to understand for humans. Spoken dialog systems can help to enhance the intelligibility of their output, either by modifying the speech synthesis (e.g., imitate Lombard speech) or by optimizing the language generation. We here focus on the second type of approach, by which an intended message is realized with words that are more intelligible in a specific noisy environment. By conducting a speech perception experiment, we created a dataset of 900 paraphrases in babble noise, perceived by native English speakers with normal hearing. We find that careful selection of paraphrases can improve intelligibility by 33% at SNR -5 dB. Our analysis of the data shows that the intelligibility differences between paraphrases are mainly driven by noise-robust acoustic cues. Furthermore, we propose an intelligibility-aware paraphrase ranking model, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing
