TRILLsson: Distilled Universal Paralinguistic Speech Representations
Joel Shor, Subhashini Venugopalan

TL;DR
TRILLsson introduces a set of small, efficient, and high-performing paralinguistic speech models distilled from larger models, enabling deployment on resource-constrained devices while maintaining competitive accuracy.
Contribution
The paper presents a collection of publicly available, distilled paralinguistic speech models that are significantly smaller yet nearly as accurate as larger models, using knowledge distillation on public data.
Findings
Largest model is 15% the size of the original, with 96% accuracy on most tasks.
Smallest model is 1% the size, achieving over 90% accuracy.
Models outperform open-source Wav2Vec 2.0 on 6 of 7 tasks.
Abstract
Recent advances in self-supervision have dramatically improved the quality of speech representations. However, deployment of state-of-the-art embedding models on devices has been restricted due to their limited public availability and large resource footprint. Our work addresses these issues by publicly releasing a collection of paralinguistic speech models that are small and near state-of-the-art performance. Our approach is based on knowledge distillation, and our models are distilled on public data only. We explore different architectures and thoroughly evaluate our models on the Non-Semantic Speech (NOSS) benchmark. Our largest distilled model is less than 15% the size of the original model (314MB vs 2.2GB), achieves over 96% the accuracy on 6 of 7 tasks, and is trained on 6.5% the data. The smallest model is 1% in size (22MB) and achieves over 90% the accuracy on 6 of 7 tasks. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Speech and dialogue systems
