TL;DR
KoALa-Bench is a comprehensive benchmark designed to evaluate Korean speech understanding and faithfulness of large audio language models, addressing the scarcity of non-English benchmarks.
Contribution
The paper introduces KoALa-Bench, a new benchmark with six tasks, including Korean-specific content, for evaluating LALMs on Korean speech understanding and faithfulness.
Findings
Extensive experiments conducted on six models.
Benchmark, code, and leaderboard publicly available.
Includes Korean cultural and academic content.
Abstract
Recent advances in large audio language models (LALMs) have enabled multilingual speech understanding. However, benchmarks for evaluating LALMs remain scarce for non-English languages, with Korean being one such underexplored case. In this paper, we introduce KoALa-Bench, a comprehensive benchmark for evaluating Korean speech understanding and speech faithfulness of LALMs. In particular, KoALa-Bench comprises six tasks. Four tasks evaluate fundamental speech understanding capabilities, including automatic speech recognition, speech translation, speech question answering, and speech instruction following, while the remaining two tasks evaluate speech faithfulness, motivated by our observation that several LALMs often fail to fully leverage the speech modality. Furthermore, to reflect Korea-specific knowledge, our benchmark incorporates listening questions from the Korean college…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
