Continual-NExT: A Unified Comprehension And Generation Continual Learning Framework
Jingyang Qiao, Zhizhong Zhang, Xin Tan, Jingyu Gong, Yanyun Qu, Yuan Xie

TL;DR
This paper introduces Continual-NExT, a framework for lifelong learning in Dual-to-Dual Multimodal Large Language Models, addressing challenges like catastrophic forgetting and knowledge transfer with a novel MAGE method.
Contribution
The paper proposes a standardized continual learning framework and a new MAGE method to enhance knowledge transfer and reduce forgetting in Dual-to-Dual MLLMs.
Findings
MAGE outperforms existing continual learning methods.
Continual-NExT achieves state-of-the-art results.
Framework effectively mitigates catastrophic forgetting.
Abstract
Dual-to-Dual MLLMs refer to Multimodal Large Language Models, which can enable unified multimodal comprehension and generation through text and image modalities. Although exhibiting strong instantaneous learning and generalization capabilities, Dual-to-Dual MLLMs still remain deficient in lifelong evolution, significantly affecting continual adaptation to dynamic real-world scenarios. One of the challenges is that learning new tasks inevitably destroys the learned knowledge. Beyond traditional catastrophic forgetting, Dual-to-Dual MLLMs face other challenges, including hallucination, instruction unfollowing, and failures in cross-modal knowledge transfer. However, no standardized continual learning framework for Dual-to-Dual MLLMs has been established yet, leaving these challenges unexplored. Thus, in this paper, we establish Continual-NExT, a continual learning framework for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
