MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning
Huiyu Xiong, Lanxiao Wang, Heqian Qiu, Taijin Zhao, Benliu Qiu,, Hongliang Li

TL;DR
This paper introduces MCF-VC, a novel method for class-incremental learning in multimodal video captioning, effectively reducing catastrophic forgetting through fine-grained knowledge selection and two-stage knowledge distillation, without replaying old data.
Contribution
The paper proposes MCF-VC, a new approach combining FgSS and TsKD to mitigate forgetting in incremental video captioning, addressing stability-plasticity in complex multimodal tasks.
Findings
Significantly reduces forgetting without replaying old samples.
Achieves strong performance on new tasks while retaining old task knowledge.
Demonstrates effectiveness on MSR-VTT dataset.
Abstract
To address the problem of catastrophic forgetting due to the invisibility of old categories in sequential input, existing work based on relatively simple categorization tasks has made some progress. In contrast, video captioning is a more complex task in multimodal scenario, which has not been explored in the field of incremental learning. After identifying this stability-plasticity problem when analyzing video with sequential input, we originally propose a method to Mitigate Catastrophic Forgetting in class-incremental learning for multimodal Video Captioning (MCF-VC). As for effectively maintaining good performance on old tasks at the macro level, we design Fine-grained Sensitivity Selection (FgSS) based on the Mask of Linear's Parameters and Fisher Sensitivity to pick useful knowledge from old tasks. Further, in order to better constrain the knowledge characteristics of old and new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
MethodsKnowledge Distillation
