Human Speech Perception in Noise: Can Large Language Models Paraphrase   to Improve It?

Anupama Chingacham; Miaoran Zhang; Vera Demberg; Dietrich Klakow

arXiv:2408.04029·cs.CL·August 9, 2024

Human Speech Perception in Noise: Can Large Language Models Paraphrase to Improve It?

Anupama Chingacham, Miaoran Zhang, Vera Demberg, Dietrich Klakow

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explores whether large language models can generate paraphrases that improve human speech perception in noisy environments, proposing a new prompting method that significantly enhances intelligibility.

Contribution

It introduces a novel task of generating acoustically intelligible paraphrases using LLMs and proposes a prompt-and-select approach to decouple textual and non-textual attributes.

Findings

01

Prompt-and-select improves speech perception by 40% in noisy conditions.

02

LLMs struggle to control acoustic intelligibility with standard prompts.

03

The method enhances human perception of distorted speech in noise.

Abstract

Large Language Models (LLMs) can generate text by transferring style attributes like formality resulting in formal or informal text. However, instructing LLMs to generate text that when spoken, is more intelligible in an acoustically difficult environment, is an under-explored topic. We conduct the first study to evaluate LLMs on a novel task of generating acoustically intelligible paraphrases for better human speech perception in noise. Our experiments in English demonstrated that with standard prompting, LLMs struggle to control the non-textual attribute, i.e., acoustic intelligibility, while efficiently capturing the desired textual attributes like semantic equivalence. To remedy this issue, we propose a simple prompting approach, prompt-and-select, which generates paraphrases by decoupling the desired textual and non-textual attributes in the text generation pipeline. Our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uds-lsv/llm_eval_pi-spin
pytorchOfficial

Videos

Human Speech Perception in Noise: Can Large Language Models Paraphrase to Improve It?· underline

Taxonomy

TopicsSpeech Recognition and Synthesis