Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor
Yongjie Si, Yanxiong Li, Jialong Li, Jiaxin Tan, Qianhua He

TL;DR
This paper introduces a novel fully few-shot class-incremental audio classification method using an expandable dual-embedding extractor, effectively handling data scarcity across all sessions and outperforming baseline methods on multiple datasets.
Contribution
It proposes an expandable dual-embedding extractor model with a pretrained and finetuned AST for fully few-shot incremental audio classification, addressing data scarcity in all sessions.
Findings
Outperforms seven baseline methods in average accuracy
Effective on three diverse audio datasets
Statistically significant improvements
Abstract
It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessions. Moreover, we propose a method using expandable dual-embedding extractor to solve it. The proposed model consists of an embedding extractor and an expandable classifier. The embedding extractor consists of a pretrained Audio Spectrogram Transformer (AST) and a finetuned AST. The expandable classifier consists of prototypes and each prototype represents a class. Experiments are conducted on three datasets (LS-100, NSynth-100 and FSC-89). Results show that our method exceeds seven baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies
MethodsAttention Is All You Need · Residual Connection · Softmax · Balanced Selection · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Linear Layer · Multi-Head Attention
