TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models

Junyi Peng; Takanori Ashihara; Marc Delcroix; Tsubasa Ochiai; Oldrich Plchot; Shoko Araki; Jan \v{C}ernock\'y

arXiv:2505.06660·cs.CL·May 13, 2025

TS-SUPERB: A Target Speech Processing Benchmark for Speech Self-Supervised Learning Models

Junyi Peng, Takanori Ashihara, Marc Delcroix, Tsubasa Ochiai, Oldrich Plchot, Shoko Araki, Jan \v{C}ernock\'y

PDF

Open Access 1 Repo

TL;DR

TS-SUPERB introduces a comprehensive benchmark for evaluating speech self-supervised learning models in challenging multi-talker, noisy environments, emphasizing target speaker identification and extraction tasks.

Contribution

The paper presents TS-SUPERB, a new benchmark for target speaker tasks in multi-talker scenarios, and explores joint optimization of SSL models for improved performance.

Findings

01

SSL models perform variably across target speaker tasks

02

Joint optimization enhances target speaker processing effectiveness

03

Benchmark reveals limitations of single-task evaluations

Abstract

Self-supervised learning (SSL) models have significantly advanced speech processing tasks, and several benchmarks have been proposed to validate their effectiveness. However, previous benchmarks have primarily focused on single-speaker scenarios, with less exploration of target-speaker tasks in noisy, multi-talker conditions -- a more challenging yet practical case. In this paper, we introduce the Target-Speaker Speech Processing Universal Performance Benchmark (TS-SUPERB), which includes four widely recognized target-speaker processing tasks that require identifying the target speaker and extracting information from the speech mixture. In our benchmark, the speaker embedding extracted from enrollment speech is used as a clue to condition downstream models. The benchmark result reveals the importance of evaluating SSL models in target speaker scenarios, demonstrating that performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BUTSpeechFIT/TS_SUPERB
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems

MethodsSpatio-temporal stability analysis