StressTest: Can YOUR Speech LM Handle the Stress?
Iddo Yosha, Gallil Maimon, Yossi Adi

TL;DR
StressTest introduces a benchmark and dataset to evaluate and improve speech language models' ability to understand sentence stress and its impact on meaning.
Contribution
The paper presents StressTest, a new benchmark and Stress-17k dataset, and fine-tuned model StresSLM that outperform existing models in stress-based speech reasoning.
Findings
Existing SLMs perform poorly on stress-based meaning tasks.
The Stress-17k dataset enables training models to recognize stress-induced meaning changes.
StresSLM outperforms other models on stress reasoning and detection.
Abstract
Sentence stress refers to emphasis on words within a spoken utterance to highlight or contrast an idea. It is often used to imply an underlying intention not explicitly stated. Recent speech-aware language models (SLMs) have enabled direct audio processing, allowing models to access the full richness of speech to perform audio reasoning tasks such as spoken question answering. Despite the crucial role of sentence stress in shaping meaning and intent, it remains largely overlooked in evaluation and development of SLMs. We address this gap by introducing StressTest, a benchmark designed to evaluate models' ability to distinguish between meanings of speech based on the stress pattern. We evaluate leading SLMs, and find that despite their overall capabilities, they perform poorly on such tasks. Hence, we propose a novel data generation pipeline, and create Stress-17k, a training set that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
