Class-Incremental Grouping Network for Continual Audio-Visual Learning
Shentong Mo, Weiguo Pian, Yapeng Tian

TL;DR
This paper introduces a novel class-incremental grouping network (CIGN) for continual audio-visual learning, which learns category-wise features and prevents forgetting across sequential tasks, achieving state-of-the-art results.
Contribution
The paper proposes a new CIGN model that learns cross-modal class-aware features and employs class tokens distillation and grouping to mitigate catastrophic forgetting in continual audio-visual learning.
Findings
CIGN achieves state-of-the-art performance on multiple benchmarks.
The model effectively learns compact cross-modal representations.
Experimental results validate the robustness of the proposed approach.
Abstract
Continual learning is a challenging problem in which models need to be trained on non-stationary data across sequential tasks for class-incremental learning. While previous methods have focused on using either regularization or rehearsal-based frameworks to alleviate catastrophic forgetting in image classification, they are limited to a single modality and cannot learn compact class-aware cross-modal representations for continual audio-visual learning. To address this gap, we propose a novel class-incremental grouping network (CIGN) that can learn category-wise semantic features to achieve continual audio-visual learning. Our CIGN leverages learnable audio-visual class tokens and audio-visual grouping to continually aggregate class-aware features. Additionally, it utilizes class tokens distillation and continual grouping to prevent forgetting parameters learned from previous tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing
