Speech Self-Supervised Representations Benchmarking: a Case for Larger   Probing Heads

Salah Zaiem; Youcef Kemiche; Titouan Parcollet; Slim Essid; Mirco; Ravanelli

arXiv:2308.14456·eess.AS·February 22, 2024

Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads

Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco, Ravanelli

PDF

TL;DR

This paper investigates how the choice of larger-capacity probing heads affects the benchmarking of speech self-supervised representations, revealing significant impacts on performance rankings and model evaluation.

Contribution

It demonstrates that using larger probing heads in speech SSL benchmarking significantly influences performance outcomes and model rankings, challenging common evaluation practices.

Findings

01

Larger probing heads cause fluctuations in model performance rankings.

02

Using bigger heads impacts inference costs and generalization.

03

Larger heads enable better multi-level feature exploitation.

Abstract

Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has been growing, most proposals rely upon a single downstream architecture that maps the frozen SSL representations to the task labels. This study examines how benchmarking results are affected by changes in the probing head architecture. Interestingly, we found that altering the downstream architecture structure leads to significant fluctuations in the performance ranking of the evaluated models. Against common practices in speech SSL benchmarking, we evaluate larger-capacity probing heads, showing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.