Information-Theoretic Constraints for Continual Vision-Language-Action Alignment

Libang Zhao; Qixin Zeng; Hongyin Zhang; Donglin Wang

arXiv:2603.13335·cs.CV·March 17, 2026

Information-Theoretic Constraints for Continual Vision-Language-Action Alignment

Libang Zhao, Qixin Zeng, Hongyin Zhang, Donglin Wang

PDF

Open Access

TL;DR

This paper introduces Info-VLA, a novel continual learning framework for vision-language-action models that preserves cross-modal information structure to mitigate catastrophic forgetting in robotic environments.

Contribution

It proposes a dual-constraint approach combining stable alignment anchors and mutual information maximization to maintain cross-modal dependencies during continual learning.

Findings

01

Significantly outperforms existing methods in task retention.

02

Effectively balances stability and plasticity in continual learning.

03

Preserves cross-modal information structure during adaptation.

Abstract

When deployed in open-ended robotic environments, Vision--Language--Action (VLA) models need to continually acquire new skills, yet suffer from severe catastrophic forgetting. We observe that this degradation is related to the deterioration of cross-modal information structure, where dependencies among visual observations, language instructions, and actions progressively diffuse during continual adaptation. But existing continual learning methods fail to preserve such cross-modal information dependencies. Thus, we propose Info-VLA, an information-preserving continual learning framework that maintains cross-modal information structure through two complementary constraints. Replay Anchor Contrastive Learning constructs stable alignment anchors from a frozen teacher model, preserving cross-modal alignment in the representation space. Cross-Modal Mutual Information Maximization further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Robot Manipulation and Learning