Audio-Visual Class-Incremental Learning
Weiguo Pian, Shentong Mo, Yunhui Guo, Yapeng Tian

TL;DR
This paper introduces AV-CIL, a novel approach for audio-visual class-incremental learning that maintains semantic similarity and visual attention across incremental steps, significantly improving performance on new datasets.
Contribution
The paper proposes AV-CIL with D-AVSC and VAD techniques to address semantic forgetting and attention loss in audio-visual incremental learning scenarios.
Findings
AV-CIL outperforms existing methods on three new datasets.
Dual-Audio-Visual Similarity Constraint effectively preserves semantic relations.
Visual Attention Distillation retains audio-guided visual attention.
Abstract
In this paper, we introduce audio-visual class-incremental learning, a class-incremental learning scenario for audio-visual video recognition. We demonstrate that joint audio-visual modeling can improve class-incremental learning, but current methods fail to preserve semantic similarity between audio and visual features as incremental step grows. Furthermore, we observe that audio-visual correlations learned in previous tasks can be forgotten as incremental steps progress, leading to poor performance. To overcome these challenges, we propose AV-CIL, which incorporates Dual-Audio-Visual Similarity Constraint (D-AVSC) to maintain both instance-aware and class-aware semantic similarity between audio-visual modalities and Visual Attention Distillation (VAD) to retain previously learned audio-guided visual attentive ability. We create three audio-visual class-incremental datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Audio-Visual Class-Incremental Learning· youtube
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Subtitles and Audiovisual Media
Methodsfail
