Salmon: A Suite for Acoustic Language Model Evaluation
Gallil Maimon, Amit Roth, Yossi Adi

TL;DR
SALMon is a comprehensive evaluation suite for speech language models that assesses their ability to recognize and differentiate acoustic features like noise, emotion, and speaker identity, addressing a key gap in current benchmarks.
Contribution
We introduce SALMon, a novel, fast, and comprehensive benchmark suite for evaluating speech models on diverse acoustic aspects beyond spoken content.
Findings
Different models show varying strengths in acoustic feature recognition.
The benchmark reveals specific weaknesses in emotion and noise robustness.
Evaluation results guide future improvements in speech model development.
Abstract
Speech language models have recently demonstrated great potential as universal speech processing systems. Such models have the ability to model the rich acoustic information existing in audio signals, beyond spoken content, such as emotion, background noise, etc. Despite this, evaluation benchmarks which evaluate awareness to a wide range of acoustic aspects, are lacking. To help bridge this gap, we introduce SALMon, a novel evaluation suite encompassing background noise, emotion, speaker identity and room impulse response. The proposed benchmarks both evaluate the consistency of the inspected element and how much it matches the spoken text. We follow a modelling based approach, measuring whether a model gives correct samples higher scores than incorrect ones. This approach makes the benchmark fast to compute even for large models. We evaluated several speech language models on SALMon,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
