AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning

Guransh Singh

arXiv:2604.16067·cs.LG·April 20, 2026

AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning

Guransh Singh

PDF

TL;DR

AEGIS introduces a novel orthogonal gradient projection method that preserves pre-trained vision-language model capabilities during fine-tuning for robotic control, avoiding catastrophic forgetting.

Contribution

It proposes a layer-wise orthogonal gradient projection framework that maintains the VQA manifold without requiring co-training data or replay buffers.

Findings

01

AEGIS effectively prevents catastrophic forgetting in vision-language models.

02

The method sheds less than 1% of gradient energy while eliminating activation drift.

03

AEGIS outperforms existing defenses by preserving model capabilities during fine-tuning.

Abstract

Adapting pre-trained vision-language models (VLMs) for robotic control requires injecting high-magnitude continuous gradients from a flow-matching action expert into a backbone trained exclusively with cross-entropy. This cross-modal gradient asymmetry - the spectral dimensionality mismatch between low-rank MSE regression gradients and the high-dimensional semantic manifold sculpted by CE pre-training, causes rapid, severe erosion of the VLM's visual-question-answering (VQA) capability. Industry-standard defences either sever the gradient pathway entirely via stop gradient, discarding the rich continuous supervision, or restrict parameter capacity through low-rank adapters (LoRA) that constrain the rank of updates but not their direction, and thus still overwrite the pre-trained manifold. We introduce AEGIS (Anchor-Enforced Gradient Isolation System): a buffer-free, layer-wise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.