Human Speech Perception in Noise: Can Large Language Models Paraphrase to Improve It?
Anupama Chingacham, Miaoran Zhang, Vera Demberg, Dietrich Klakow

TL;DR
This paper explores whether large language models can generate paraphrases that improve human speech perception in noisy environments, proposing a new prompting method that significantly enhances intelligibility.
Contribution
It introduces a novel task of generating acoustically intelligible paraphrases using LLMs and proposes a prompt-and-select approach to decouple textual and non-textual attributes.
Findings
Prompt-and-select improves speech perception by 40% in noisy conditions.
LLMs struggle to control acoustic intelligibility with standard prompts.
The method enhances human perception of distorted speech in noise.
Abstract
Large Language Models (LLMs) can generate text by transferring style attributes like formality resulting in formal or informal text. However, instructing LLMs to generate text that when spoken, is more intelligible in an acoustically difficult environment, is an under-explored topic. We conduct the first study to evaluate LLMs on a novel task of generating acoustically intelligible paraphrases for better human speech perception in noise. Our experiments in English demonstrated that with standard prompting, LLMs struggle to control the non-textual attribute, i.e., acoustic intelligibility, while efficiently capturing the desired textual attributes like semantic equivalence. To remedy this issue, we propose a simple prompting approach, prompt-and-select, which generates paraphrases by decoupling the desired textual and non-textual attributes in the text generation pipeline. Our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis
