# Online incremental learning for audio classification using a pretrained audio model

**Authors:** Manjunath Mulimani, Annamaria Mesaros

arXiv: 2508.20732 · 2025-08-29

## TL;DR

This paper introduces an online incremental learning approach for audio classification that leverages pre-trained audio embeddings and a simple adaptation layer, enabling effective learning of new tasks with minimal forgetting.

## Contribution

The work proposes a novel method using a pre-trained audio model with an added adaptation layer for online incremental learning, improving performance over existing methods.

## Key findings

- Outperforms other methods in class-incremental learning on ESC-50.
- Achieves better domain-incremental learning results on TAU Urban Acoustic Scenes.
- Enables single-pass online adaptation with minimal forgetting.

## Abstract

Incremental learning aims to learn new tasks sequentially without forgetting the previously learned ones. Most of the existing incremental learning methods for audio focus on training the model from scratch on the initial task, and the same model is used to learn upcoming incremental tasks. The model is trained for several iterations to adapt to each new task, using some specific approaches to reduce the forgetting of old tasks. In this work, we propose a method for using generalizable audio embeddings produced by a pre-trained model to develop an online incremental learner that solves sequential audio classification tasks over time. Specifically, we inject a layer with a nonlinear activation function between the pre-trained model's audio embeddings and the classifier; this layer expands the dimensionality of the embeddings and effectively captures the distinct characteristics of sound classes. Our method adapts the model in a single forward pass (online) through the training samples of any task, with minimal forgetting of old tasks. We demonstrate the performance of the proposed method in two incremental learning setups: one class-incremental learning using ESC-50 and one domain-incremental learning of different cities from the TAU Urban Acoustic Scenes 2019 dataset; for both cases, the proposed approach outperforms other methods.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20732/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20732/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/2508.20732/full.md

---
Source: https://tomesphere.com/paper/2508.20732