MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning   for Multimodal Video Captioning

Huiyu Xiong; Lanxiao Wang; Heqian Qiu; Taijin Zhao; Benliu Qiu,; Hongliang Li

arXiv:2402.17680·cs.CV·February 28, 2024·1 cites

MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning

Huiyu Xiong, Lanxiao Wang, Heqian Qiu, Taijin Zhao, Benliu Qiu,, Hongliang Li

PDF

Open Access

TL;DR

This paper introduces MCF-VC, a novel method for class-incremental learning in multimodal video captioning, effectively reducing catastrophic forgetting through fine-grained knowledge selection and two-stage knowledge distillation, without replaying old data.

Contribution

The paper proposes MCF-VC, a new approach combining FgSS and TsKD to mitigate forgetting in incremental video captioning, addressing stability-plasticity in complex multimodal tasks.

Findings

01

Significantly reduces forgetting without replaying old samples.

02

Achieves strong performance on new tasks while retaining old task knowledge.

03

Demonstrates effectiveness on MSR-VTT dataset.

Abstract

To address the problem of catastrophic forgetting due to the invisibility of old categories in sequential input, existing work based on relatively simple categorization tasks has made some progress. In contrast, video captioning is a more complex task in multimodal scenario, which has not been explored in the field of incremental learning. After identifying this stability-plasticity problem when analyzing video with sequential input, we originally propose a method to Mitigate Catastrophic Forgetting in class-incremental learning for multimodal Video Captioning (MCF-VC). As for effectively maintaining good performance on old tasks at the macro level, we design Fine-grained Sensitivity Selection (FgSS) based on the Mask of Linear's Parameters and Fisher Sensitivity to pick useful knowledge from old tasks. Further, in order to better constrain the knowledge characteristics of old and new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition

MethodsKnowledge Distillation