SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of   Self-Supervised Speech Representation Learning

Tzu-hsun Feng; Annie Dong; Ching-Feng Yeh; Shu-wen Yang and; Tzu-Quan Lin; Jiatong Shi; Kai-Wei Chang; Zili Huang; Haibin Wu; and Xuankai Chang; Shinji Watanabe; Abdelrahman Mohamed; Shang-Wen; Li; Hung-yi Lee

arXiv:2210.08634·cs.CL·November 1, 2022

SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning

Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang and, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, and Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen, Li, Hung-yi Lee

PDF

Open Access

TL;DR

The paper introduces the SUPERB challenge at SLT 2022, focusing on evaluating the generalization, efficiency, and performance of self-supervised speech representations across diverse tasks, with results from 14 models.

Contribution

It establishes a comprehensive benchmark and metrics for assessing SSL speech models' performance, generalization, and computational efficiency, encouraging practical SSL designs.

Findings

01

14 models evaluated with diverse performance results

02

Insights into the trade-offs between efficiency and accuracy

03

Future directions for SSL research identified

Abstract

We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency. The challenge builds upon the SUPERB benchmark and implements metrics to measure the computation requirements of self-supervised learning (SSL) representation and to evaluate its generalizability and performance across the diverse SUPERB tasks. The SUPERB benchmark provides comprehensive coverage of popular speech processing tasks, from speech and speaker recognition to audio generation and semantic understanding. As SSL has gained interest in the speech community and showed promising outcomes, we envision the challenge to uplevel the impact of SSL techniques by motivating more practical designs of techniques beyond task performance. We summarize the results of 14 submitted models in this paper. We also discuss the main…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques