Ensemble-Guided Distillation for Compact and Robust Acoustic Scene Classification on Edge Devices

Hossein Sharify; Behnam Raoufi; Mahdy Ramezani; Khosrow Hajsadeghi; Saeed Bagheri Shouraki

arXiv:2512.13905·cs.SD·December 17, 2025

Ensemble-Guided Distillation for Compact and Robust Acoustic Scene Classification on Edge Devices

Hossein Sharify, Behnam Raoufi, Mahdy Ramezani, Khosrow Hajsadeghi, Saeed Bagheri Shouraki

PDF

Open Access

TL;DR

This paper introduces a compact, robust acoustic scene classification framework using ensemble-guided knowledge distillation, optimized for edge devices, achieving state-of-the-art results on a benchmark dataset.

Contribution

It proposes a novel ensemble-guided distillation method with a lightweight student network and diverse teacher ensemble for efficient edge deployment.

Findings

01

Achieves state-of-the-art accuracy on TAU Urban Acoustic Scenes 2022 Mobile benchmark.

02

Demonstrates robustness to device and noise variability.

03

Enables efficient inference suitable for edge devices.

Abstract

We present a compact, quantization-ready acoustic scene classification (ASC) framework that couples an efficient student network with a learned teacher ensemble and knowledge distillation. The student backbone uses stacked depthwise-separable "expand-depthwise-project" blocks with global response normalization to stabilize training and improve robustness to device and noise variability, while a global pooling head yields class logits for efficient edge inference. To inject richer inductive bias, we assemble a diverse set of teacher models and learn two complementary fusion heads: z1, which predicts per-teacher mixture weights using a student-style backbone, and z2, a lightweight MLP that performs per-class logit fusion. The student is distilled from the ensemble via temperature-scaled soft targets combined with hard labels, enabling it to approximate the ensemble's decision geometry…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Domain Adaptation and Few-Shot Learning