Evolving Without Ending: Unifying Multimodal Incremental Learning for Continual Panoptic Perception

Bo Yuan; Danpei Zhao; Wentao Li; Tian Li; Zhiguo Jiang

arXiv:2601.15643·cs.CV·January 23, 2026

Evolving Without Ending: Unifying Multimodal Incremental Learning for Continual Panoptic Perception

Bo Yuan, Danpei Zhao, Wentao Li, Tian Li, Zhiguo Jiang

PDF

Open Access

TL;DR

This paper introduces a unified continual learning framework for multimodal and multi-task panoptic perception, addressing catastrophic forgetting and semantic obfuscation to improve comprehensive image understanding in incremental scenarios.

Contribution

It proposes an end-to-end CPP model with a collaborative cross-modal encoder, contrastive knowledge distillation, and a cross-modal consistency constraint, advancing multimodal continual learning.

Findings

01

Outperforms existing methods on multimodal datasets.

02

Effectively mitigates catastrophic forgetting in multi-task CL.

03

Enhances semantic alignment across modalities during incremental learning.

Abstract

Continual learning (CL) is a great endeavour in developing intelligent perception AI systems. However, the pioneer research has predominantly focus on single-task CL, which restricts the potential in multi-task and multimodal scenarios. Beyond the well-known issue of catastrophic forgetting, the multi-task CL also brings semantic obfuscation across multimodal alignment, leading to severe model degradation during incremental training steps. In this paper, we extend CL to continual panoptic perception (CPP), integrating multimodal and multi-task CL to enhance comprehensive image perception through pixel-level, instance-level, and image-level joint interpretation. We formalize the CL task in multimodal scenarios and propose an end-to-end continual panoptic perception model. Concretely, CPP model features a collaborative cross-modal encoder (CCE) for multimodal embedding. We also propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications