CED: Consistent ensemble distillation for audio tagging

Heinrich Dinkel; Yongqing Wang; Zhiyong Yan; Junbo Zhang; Yujun Wang

arXiv:2308.11957·cs.SD·September 11, 2023

CED: Consistent ensemble distillation for audio tagging

Heinrich Dinkel, Yongqing Wang, Zhiyong Yan, Junbo Zhang, Yujun Wang

PDF

Open Access 1 Repo 4 Models

TL;DR

This paper introduces CED, a scalable, label-free ensemble distillation framework for audio tagging that combines augmentation and knowledge distillation, improving model performance on the Audioset benchmark.

Contribution

CED is the first to combine augmentation and knowledge distillation in a scalable, label-free ensemble distillation framework for audio classification.

Findings

01

Achieves 49.0 mAP on Audioset with a 10M parameter transformer model.

02

Stores logits and augmentation methods efficiently on disk, requiring minimal additional space.

03

Demonstrates improved performance over individual models using consistent teaching.

Abstract

Augmentation and knowledge distillation (KD) are well-established techniques employed in audio classification tasks, aimed at enhancing performance and reducing model sizes on the widely recognized Audioset (AS) benchmark. Although both techniques are effective individually, their combined use, called consistent teaching, hasn't been explored before. This paper proposes CED, a simple training framework that distils student models from large teacher ensembles with consistent teaching. To achieve this, CED efficiently stores logits as well as the augmentation methods on disk, making it scalable to large-scale datasets. Central to CED's efficacy is its label-free nature, meaning that only the stored logits are used for the optimization of a student model only requiring 0.3\% additional disk space for AS. The study trains various transformer-based models, including a 10M parameter model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

richermans/ced
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies