SUPERB: Speech processing Universal PERformance Benchmark

Shu-wen Yang; Po-Han Chi; Yung-Sung Chuang; Cheng-I Jeff Lai; Kushal; Lakhotia; Yist Y. Lin; Andy T. Liu; Jiatong Shi; Xuankai Chang; Guan-Ting; Lin; Tzu-Hsien Huang; Wei-Cheng Tseng; Ko-tik Lee; Da-Rong Liu; Zili Huang,; Shuyan Dong; Shang-Wen Li; Shinji Watanabe; Abdelrahman Mohamed; Hung-yi Lee

arXiv:2105.01051·cs.CL·October 19, 2021·51 cites

SUPERB: Speech processing Universal PERformance Benchmark

Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal, Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting, Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang,, Shuyan Dong, Shang-Wen Li, Shinji Watanabe

PDF

Open Access 5 Repos 10 Models 2 Datasets

TL;DR

SUPERB introduces a comprehensive benchmark and leaderboard for evaluating speech processing models across multiple tasks, emphasizing the importance of SSL representations and minimal task-specific modifications.

Contribution

It provides the first unified benchmark for speech processing, enabling systematic evaluation of shared SSL models across diverse tasks with minimal architecture changes.

Findings

01

SSL representations are highly generalizable across tasks

02

The proposed framework achieves competitive results with lightweight heads

03

SUPERB facilitates research in speech representation learning

Abstract

Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge this gap, we introduce Speech processing Universal PERformance Benchmark (SUPERB). SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data. Among multiple usages of the shared model, we especially focus on extracting the representation learned from SSL due to its preferable re-usability. We present a simple framework to solve SUPERB tasks by learning task-specialized lightweight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques