Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec   models

Haibin Wu; Xuanjun Chen; Yi-Cheng Lin; Kaiwei Chang; Jiawei Du; Ke-Han; Lu; Alexander H. Liu; Ho-Lam Chung; Yuan-Kuei Wu; Dongchao Yang; Songxiang; Liu; Yi-Chiao Wu; Xu Tan; James Glass; Shinji Watanabe; Hung-yi Lee

arXiv:2409.14085·eess.AS·September 24, 2024

Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models

Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kaiwei Chang, Jiawei Du, Ke-Han, Lu, Alexander H. Liu, Ho-Lam Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang, Liu, Yi-Chiao Wu, Xu Tan, James Glass, Shinji Watanabe, Hung-yi Lee

PDF

Open Access 1 Repo

TL;DR

This paper introduces Codec-SUPERB, a benchmark for fairly comparing neural audio codec models, aiming to promote progress by standardizing evaluation conditions and datasets.

Contribution

It presents a new lightweight benchmark challenge with standardized rules, datasets, and evaluation metrics for neural audio codecs.

Findings

01

Five participant systems evaluated

02

Benchmark facilitates fair comparison of models

03

Results highlight current strengths and limitations

Abstract

Neural audio codec models are becoming increasingly important as they serve as tokenizers for audio, enabling efficient transmission or facilitating speech language modeling. The ideal neural audio codec should maintain content, paralinguistics, speaker characteristics, and audio information even at low bitrates. Recently, numerous advanced neural codec models have been proposed. However, codec models are often tested under varying experimental conditions. As a result, we introduce the Codec-SUPERB challenge at SLT 2024, designed to facilitate fair and lightweight comparisons among existing codec models and inspire advancements in the field. This challenge brings together representative speech applications and objective metrics, and carefully selects license-free datasets, sampling them into small sets to reduce evaluation computation costs. This paper presents the challenge's rules,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ga642381/speech-trident
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques