TL;DR
This paper introduces a novel approach for fully few-shot class-incremental audio classification using a multi-level embedding extractor and a ridge regression classifier, addressing data scarcity in both base and incremental classes.
Contribution
It proposes a decoupled model with a frozen embedding extractor and a continually updated classifier, improving accuracy and reducing complexity in fully few-shot scenarios.
Findings
Outperforms current methods in accuracy on three datasets.
Maintains low complexity compared to existing approaches.
Effective in scenarios with limited training samples for all classes.
Abstract
In the task of Few-shot Class-incremental Audio Classification (FCAC), training samples of each base class are required to be abundant to train model. However, it is not easy to collect abundant training samples for many base classes due to data scarcity and high collection cost. We discuss a more realistic issue, Fully FCAC (FFCAC), in which training samples of both base and incremental classes are only a few. Furthermore, we propose a FFCAC method using a model which is decoupled into a multi-level embedding extractor and a ridge regression classifier. The embedding extractor consists of an encoder of audio spectrogram Transformer and a fusion module, and is trained in the base session but frozen in all incremental sessions. The classifier is updated continually in each incremental session. Results on three public datasets show that our method exceeds current methods in accuracy, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · Label Smoothing · Transformer · Balanced Selection
