SUPERB: Speech processing Universal PERformance Benchmark
Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal, Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting, Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang,, Shuyan Dong, Shang-Wen Li, Shinji Watanabe

TL;DR
SUPERB introduces a comprehensive benchmark and leaderboard for evaluating speech processing models across multiple tasks, emphasizing the importance of SSL representations and minimal task-specific modifications.
Contribution
It provides the first unified benchmark for speech processing, enabling systematic evaluation of shared SSL models across diverse tasks with minimal architecture changes.
Findings
SSL representations are highly generalizable across tasks
The proposed framework achieves competitive results with lightweight heads
SUPERB facilitates research in speech representation learning
Abstract
Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge this gap, we introduce Speech processing Universal PERformance Benchmark (SUPERB). SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data. Among multiple usages of the shared model, we especially focus on extracting the representation learned from SSL due to its preferable re-usability. We present a simple framework to solve SUPERB tasks by learning task-specialized lightweight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗superb/wav2vec2-base-superb-sidmodel· 733 dl· ♡ 23733 dl♡ 23
- 🤗anton-l/wav2vec2-base-superb-svmodel· 533 dl· ♡ 3533 dl♡ 3
- 🤗mishig/test_regex_searchreplacemodel
- 🤗superb/hubert-base-superb-ermodel· 4.1k dl· ♡ 224.1k dl♡ 22
- 🤗superb/hubert-base-superb-icmodel· 270 dl270 dl
- 🤗superb/hubert-base-superb-ksmodel· 568 dl· ♡ 8568 dl♡ 8
- 🤗superb/hubert-base-superb-sidmodel· 353 dl· ♡ 1353 dl♡ 1
- 🤗superb/hubert-large-superb-ermodel· 21k dl· ♡ 2521k dl♡ 25
- 🤗superb/hubert-large-superb-icmodel· 3 dl3 dl
- 🤗superb/hubert-large-superb-ksmodel· 18 dl18 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
