Fully Few-shot Class-incremental Audio Classification Using Expandable   Dual-embedding Extractor

Yongjie Si; Yanxiong Li; Jialong Li; Jiaxin Tan; Qianhua He

arXiv:2406.08122·eess.AS·June 13, 2024

Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor

Yongjie Si, Yanxiong Li, Jialong Li, Jiaxin Tan, Qianhua He

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel fully few-shot class-incremental audio classification method using an expandable dual-embedding extractor, effectively handling data scarcity across all sessions and outperforming baseline methods on multiple datasets.

Contribution

It proposes an expandable dual-embedding extractor model with a pretrained and finetuned AST for fully few-shot incremental audio classification, addressing data scarcity in all sessions.

Findings

01

Outperforms seven baseline methods in average accuracy

02

Effective on three diverse audio datasets

03

Statistically significant improvements

Abstract

It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessions. Moreover, we propose a method using expandable dual-embedding extractor to solve it. The proposed model consists of an embedding extractor and an expandable classifier. The embedding extractor consists of a pretrained Audio Spectrogram Transformer (AST) and a finetuned AST. The expandable classifier consists of prototypes and each prototype represents a class. Experiments are conducted on three datasets (LS-100, NSynth-100 and FSC-89). Results show that our method exceeds seven baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yongjiesi/ede
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies

MethodsAttention Is All You Need · Residual Connection · Softmax · Balanced Selection · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Linear Layer · Multi-Head Attention