TL;DR
This paper introduces SSATKD, a novel knowledge distillation framework that enhances environmental sound classification by integrating low-level audio textures with high-level context, showing consistent improvements across diverse datasets.
Contribution
The paper presents a new distillation framework combining structural and statistical audio textures with high-level features, improving sound classification accuracy.
Findings
Consistent accuracy improvements across four diverse datasets.
Effective with both convolutional and transformer-based teacher models.
Robust performance with different teacher adaptation strategies.
Abstract
While knowledge distillation has shown success in various audio tasks, its application to environmental sound classification often overlooks essential low-level audio texture features needed to capture local patterns in complex acoustic environments. To address this gap, the Structural and Statistical Audio Texture Knowledge Distillation (SSATKD) framework is proposed, which combines high-level contextual information with low-level structural and statistical audio textures extracted from intermediate layers. To evaluate its generalizability across diverse acoustic domains, SSATKD is tested on four datasets within the environmental sound classification domain, including two passive sonar datasets (DeepShip and Vessel Type Underwater Acoustic Data (VTUAD)) and two general environmental sound datasets (Environmental Sound Classification 50 (ESC-50) and Tampere University of Technology…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
