Fully Few-shot Class-incremental Audio Classification Using Multi-level Embedding Extractor and Ridge Regression Classifier

Yongjie Si; Yanxiong Li; Jiaxin Tan; Qianhua He; Il-Youp Kwak

arXiv:2506.18406·eess.AS·June 24, 2025·Interspeech

Fully Few-shot Class-incremental Audio Classification Using Multi-level Embedding Extractor and Ridge Regression Classifier

Yongjie Si, Yanxiong Li, Jiaxin Tan, Qianhua He, Il-Youp Kwak

PDF

1 Repo

TL;DR

This paper introduces a novel approach for fully few-shot class-incremental audio classification using a multi-level embedding extractor and a ridge regression classifier, addressing data scarcity in both base and incremental classes.

Contribution

It proposes a decoupled model with a frozen embedding extractor and a continually updated classifier, improving accuracy and reducing complexity in fully few-shot scenarios.

Findings

01

Outperforms current methods in accuracy on three datasets.

02

Maintains low complexity compared to existing approaches.

03

Effective in scenarios with limited training samples for all classes.

Abstract

In the task of Few-shot Class-incremental Audio Classification (FCAC), training samples of each base class are required to be abundant to train model. However, it is not easy to collect abundant training samples for many base classes due to data scarcity and high collection cost. We discuss a more realistic issue, Fully FCAC (FFCAC), in which training samples of both base and incremental classes are only a few. Furthermore, we propose a FFCAC method using a model which is decoupled into a multi-level embedding extractor and a ridge regression classifier. The embedding extractor consists of an encoder of audio spectrogram Transformer and a fusion module, and is trained in the base session but frozen in all incremental sessions. The classifier is updated continually in each incremental session. Results on three public datasets show that our method exceeds current methods in accuracy, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yongjiesi/mar
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · Label Smoothing · Transformer · Balanced Selection