Joint Feature and Output Distillation for Low-complexity Acoustic Scene Classification
Haowen Li, Ziyi Yang, Mou Wang, Ee-Leng Tan, Junwei Yeow, Santi Peksi, Woon-Seng Gan

TL;DR
This paper introduces a dual-level knowledge distillation framework with multi-teacher guidance to improve low-complexity acoustic scene classification, effectively transferring both soft logits and feature representations from teacher models to a compact student model.
Contribution
It proposes a novel joint feature and output distillation strategy with multi-teacher guidance for low-complexity ASC, enhancing model performance.
Findings
Achieved up to 59.30% accuracy on TAU Urban Acoustic Scenes dataset.
Demonstrated effectiveness of combined logit and feature distillation.
Validated the approach with competitive results in DCASE2025 Task 1.
Abstract
This report presents a dual-level knowledge distillation framework with multi-teacher guidance for low-complexity acoustic scene classification (ASC) in DCASE2025 Task 1. We propose a distillation strategy that jointly transfers both soft logits and intermediate feature representations. Specifically, we pre-trained PaSST and CP-ResNet models as teacher models. Logits from teachers are averaged to generate soft targets, while one CP-ResNet is selected for feature-level distillation. This enables the compact student model (CP-Mobile) to capture both semantic distribution and structural information from teacher guidance. Experiments on the TAU Urban Acoustic Scenes 2022 Mobile dataset (development set) demonstrate that our submitted systems achieve up to 59.30\% accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
