Class-Incremental Grouping Network for Continual Audio-Visual Learning

Shentong Mo; Weiguo Pian; Yapeng Tian

arXiv:2309.05281·cs.CV·September 12, 2023·1 cites

Class-Incremental Grouping Network for Continual Audio-Visual Learning

Shentong Mo, Weiguo Pian, Yapeng Tian

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel class-incremental grouping network (CIGN) for continual audio-visual learning, which learns category-wise features and prevents forgetting across sequential tasks, achieving state-of-the-art results.

Contribution

The paper proposes a new CIGN model that learns cross-modal class-aware features and employs class tokens distillation and grouping to mitigate catastrophic forgetting in continual audio-visual learning.

Findings

01

CIGN achieves state-of-the-art performance on multiple benchmarks.

02

The model effectively learns compact cross-modal representations.

03

Experimental results validate the robustness of the proposed approach.

Abstract

Continual learning is a challenging problem in which models need to be trained on non-stationary data across sequential tasks for class-incremental learning. While previous methods have focused on using either regularization or rehearsal-based frameworks to alleviate catastrophic forgetting in image classification, they are limited to a single modality and cannot learn compact class-aware cross-modal representations for continual audio-visual learning. To address this gap, we propose a novel class-incremental grouping network (CIGN) that can learn category-wise semantic features to achieve continual audio-visual learning. Our CIGN leverages learnable audio-visual class tokens and audio-visual grouping to continually aggregate class-aware features. Additionally, it utilizes class tokens distillation and continual grouping to prevent forgetting parameters learned from previous tasks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stonemo/cign
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Music and Audio Processing