CA-SSLR: Condition-Aware Self-Supervised Learning Representation for   Generalized Speech Processing

Yen-Ju Lu; Jing Liu; Thomas Thebaud; Laureano Moro-Velazquez; Ariya; Rastrow; Najim Dehak; Jesus Villalba

arXiv:2412.04425·eess.AS·December 6, 2024

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing

Yen-Ju Lu, Jing Liu, Thomas Thebaud, Laureano Moro-Velazquez, Ariya, Rastrow, Najim Dehak, Jesus Villalba

PDF

Open Access 1 Video

TL;DR

CA-SSLR introduces a condition-aware self-supervised speech representation that dynamically incorporates language and speaker context, enhancing generalization and performance across diverse speech tasks with minimal tuning.

Contribution

It presents a novel condition-aware SSL model that integrates language and speaker embeddings early, reducing reliance on input features and improving adaptability to unseen tasks.

Findings

01

10% reduction in language identification errors

02

37% improvement in speech recognition CER

03

27% decrease in speaker verification EER

Abstract

We introduce Condition-Aware Self-Supervised Learning Representation (CA-SSLR), a generalist conditioning model broadly applicable to various speech-processing tasks. Compared to standard fine-tuning methods that optimize for downstream models, CA-SSLR integrates language and speaker embeddings from earlier layers, making the SSL model aware of the current language and speaker context. This approach reduces the reliance on input audio features while preserving the integrity of the base SSLR. CA-SSLR improves the model's capabilities and demonstrates its generality on unseen tasks with minimal task-specific tuning. Our method employs linear modulation to dynamically adjust internal representations, enabling fine-grained adaptability without significantly altering the original model behavior. Experiments show that CA-SSLR reduces the number of trainable parameters, mitigates overfitting,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CA-SSLR: Condition-Aware Self-Supervised Learning Representation for Generalized Speech Processing· slideslive

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems

MethodsAttentive Walk-Aggregating Graph Neural Network · Balanced Selection