SUPERB @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning
Tzu-hsun Feng, Annie Dong, Ching-Feng Yeh, Shu-wen Yang and, Tzu-Quan Lin, Jiatong Shi, Kai-Wei Chang, Zili Huang, Haibin Wu, and Xuankai Chang, Shinji Watanabe, Abdelrahman Mohamed, Shang-Wen, Li, Hung-yi Lee

TL;DR
The paper introduces the SUPERB challenge at SLT 2022, focusing on evaluating the generalization, efficiency, and performance of self-supervised speech representations across diverse tasks, with results from 14 models.
Contribution
It establishes a comprehensive benchmark and metrics for assessing SSL speech models' performance, generalization, and computational efficiency, encouraging practical SSL designs.
Findings
14 models evaluated with diverse performance results
Insights into the trade-offs between efficiency and accuracy
Future directions for SSL research identified
Abstract
We present the SUPERB challenge at SLT 2022, which aims at learning self-supervised speech representation for better performance, generalization, and efficiency. The challenge builds upon the SUPERB benchmark and implements metrics to measure the computation requirements of self-supervised learning (SSL) representation and to evaluate its generalizability and performance across the diverse SUPERB tasks. The SUPERB benchmark provides comprehensive coverage of popular speech processing tasks, from speech and speaker recognition to audio generation and semantic understanding. As SSL has gained interest in the speech community and showed promising outcomes, we envision the challenge to uplevel the impact of SSL techniques by motivating more practical designs of techniques beyond task performance. We summarize the results of 14 submitted models in this paper. We also discuss the main…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
