Exemplar Masking for Multimodal Incremental Learning
Yi-Lun Lee, Chen-Yu Lee, Wei-Chen Chiu, Yi-Hsuan Tsai

TL;DR
This paper introduces an exemplar masking framework for multimodal incremental learning that reduces storage and computational costs while improving knowledge retention, using attention-based token masking and data augmentation techniques.
Contribution
It proposes a novel exemplar masking method combined with parameter-efficient tuning to enhance multimodal incremental learning efficiency and robustness.
Findings
Reduces exemplar storage size significantly.
Improves performance in retaining old knowledge.
Extends to real-world multimodal datasets.
Abstract
Multimodal incremental learning needs to digest the information from multiple modalities while concurrently learning new knowledge without forgetting the previously learned information. There are numerous challenges for this task, mainly including the larger storage size of multimodal data in exemplar-based methods and the computational requirement of finetuning on huge multimodal models. In this paper, we leverage the parameter-efficient tuning scheme to reduce the burden of fine-tuning and propose the exemplar masking framework to efficiently replay old knowledge. Specifically, the non-important tokens are masked based on the attention weights and the correlation across different modalities, significantly reducing the storage size of an exemplar and consequently saving more exemplars under the same memory buffer. Moreover, we design a multimodal data augmentation technique to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
MethodsSoftmax · Attention Is All You Need
