Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models

Ke-Han Lu; Chun-Yi Kuan; Hung-yi Lee

arXiv:2505.19037·eess.AS·May 27, 2025

Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models

Ke-Han Lu, Chun-Yi Kuan, Hung-yi Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces Speech-IFeval, a new benchmark to evaluate instruction-following and measure catastrophic forgetting in speech-aware language models, revealing their current limitations and sensitivities.

Contribution

The paper presents Speech-IFeval, the first dedicated benchmark for assessing instruction-following and catastrophic forgetting in speech-aware language models.

Findings

01

Most SLMs perform poorly on basic instructions compared to text-based LLMs.

02

SLMs are highly sensitive to prompt variations, leading to inconsistent outputs.

03

Current evaluation metrics do not adequately capture model capabilities.

Abstract

We introduce Speech-IFeval, an evaluation framework designed to assess instruction-following capabilities and quantify catastrophic forgetting in speech-aware language models (SLMs). Recent SLMs integrate speech perception with large language models (LLMs), often degrading textual capabilities due to speech-centric training. Existing benchmarks conflate speech perception with instruction-following, hindering evaluation of these distinct skills. To address this gap, we provide a benchmark for diagnosing the instruction-following abilities of SLMs. Our findings show that most SLMs struggle with even basic instructions, performing far worse than text-based LLMs. Additionally, these models are highly sensitive to prompt variations, often yielding inconsistent and unreliable outputs. We highlight core challenges and provide insights to guide future research, emphasizing the need for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kehanlu/speech-ifeval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling