# Adapter Incremental Continual Learning of Efficient Audio Spectrogram   Transformers

**Authors:** Nithish Muthuchamy Selvaraj, Xiaobao Guo, Adams Kong, Bingquan Shen,, Alex Kot

arXiv: 2302.14314 · 2024-01-03

## TL;DR

This paper introduces a novel, efficient audio spectrogram transformer model for incremental continual learning that minimizes parameters and computational costs while preventing catastrophic forgetting across multiple audio classification benchmarks.

## Contribution

It proposes Adapter Incremental Continual Learning (AI-CL), combining convolutional adapters and frequency-time factorized attention for efficient, scalable continual learning in audio spectrogram transformers.

## Key findings

- Achieves competitive performance with less than 5% of trainable parameters.
- Reduces computational complexity significantly compared to traditional self-attention.
- Effectively prevents catastrophic forgetting on multiple audio benchmarks.

## Abstract

Continual learning involves training neural networks incrementally for new tasks while retaining the knowledge of previous tasks. However, efficiently fine-tuning the model for sequential tasks with minimal computational resources remains a challenge. In this paper, we propose Task Incremental Continual Learning (TI-CL) of audio classifiers with both parameter-efficient and compute-efficient Audio Spectrogram Transformers (AST). To reduce the trainable parameters without performance degradation for TI-CL, we compare several Parameter Efficient Transfer (PET) methods and propose AST with Convolutional Adapters for TI-CL, which has less than 5% of trainable parameters of the fully fine-tuned counterparts. To reduce the computational complexity, we introduce a novel Frequency-Time factorized Attention (FTA) method that replaces the traditional self-attention in transformers for audio spectrograms. FTA achieves competitive performance with only a factor of the computations required by Global Self-Attention (GSA). Finally, we formulate our method for TI-CL, called Adapter Incremental Continual Learning (AI-CL), as a combination of the "parameter-efficient" Convolutional Adapter and the "compute-efficient" FTA. Experiments on ESC-50, SpeechCommandsV2 (SCv2), and Audio-Visual Event (AVE) benchmarks show that our proposed method prevents catastrophic forgetting in TI-CL while maintaining a lower computational budget.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.14314/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/2302.14314/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/2302.14314/full.md

---
Source: https://tomesphere.com/paper/2302.14314