S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models

Mohammed Ali El Adlouni; Aurian Quelennec; Pierre Chouteau; Geoffroy Peeters; Slim Essid

arXiv:2604.24933·cs.AI·April 29, 2026

S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models

Mohammed Ali El Adlouni, Aurian Quelennec, Pierre Chouteau, Geoffroy Peeters, Slim Essid

PDF

1 Repo

TL;DR

S-SONDO is a novel self-supervised knowledge distillation framework that efficiently compresses large audio models into smaller ones using only output embeddings, broadening applicability to embedding-based models.

Contribution

It introduces the first embedding-only distillation method for general audio models, enabling architecture-agnostic compression without relying on logits or layer features.

Findings

01

Distilled models are up to 61 times smaller while retaining 96% of performance.

02

S-SONDO is architecture-agnostic and applicable to embedding-based teachers.

03

Provides practical insights on loss functions and data sampling strategies.

Abstract

General audio foundation models have recently achieved remarkable progress, enabling strong performance across diverse tasks. However, state-of-the-art models remain extremely large, often with hundreds of millions of parameters, leading to high inference costs and limited deployability on edge devices. Knowledge distillation is a proven strategy for model compression, but prior work in audio has mostly focused on supervised settings, relying on class logits, intermediate features, or architecture-specific techniques. Such assumptions exclude models that output only embeddings, such as self-supervised or metric-learning models. We introduce S-SONDO (Self-Supervised KnOwledge DistillatioN for General AuDio FOundation Models), the first framework to distill general audio models using only their output embeddings. By avoiding the need for logits or layer-level alignment, S-SONDO is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MedAliAdlouni/ssondo
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.