Data Efficient Acoustic Scene Classification using Teacher-Informed   Confusing Class Instruction

Jin Jie Sean Yeo; Ee-Leng Tan; Jisheng Bai; Santi Peksi; Woon-Seng; Gan

arXiv:2409.11964·cs.SD·September 19, 2024

Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction

Jin Jie Sean Yeo, Ee-Leng Tan, Jisheng Bai, Santi Peksi, Woon-Seng, Gan

PDF

Open Access

TL;DR

This paper presents data-efficient acoustic scene classification methods using model simplification, data augmentation, and teacher-informed confusing class instructions, achieving improved accuracy with limited training data.

Contribution

It introduces a novel approach combining model complexity reduction, mixup augmentation, and teacher-informed confusing class instructions for low-data acoustic scene classification.

Findings

01

Highest average accuracy of 62.21% on 100% training data

02

Effective use of data augmentation with mixup

03

Knowledge distillation improves model performance

Abstract

In this technical report, we describe the SNTL-NTU team's submission for Task 1 Data-Efficient Low-Complexity Acoustic Scene Classification of the detection and classification of acoustic scenes and events (DCASE) 2024 challenge. Three systems are introduced to tackle training splits of different sizes. For small training splits, we explored reducing the complexity of the provided baseline model by reducing the number of base channels. We introduce data augmentation in the form of mixup to increase the diversity of training samples. For the larger training splits, we use FocusNet to provide confusing class information to an ensemble of multiple Patchout faSt Spectrogram Transformer (PaSST) models and baseline models trained on the original sampling rate of 44.1 kHz. We use Knowledge Distillation to distill the ensemble model to the baseline student model. Training the systems on the TAU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer · Dropout