AEGIS: Anchor-Enforced Gradient Isolation for Knowledge-Preserving Vision-Language-Action Fine-Tuning
Guransh Singh

TL;DR
AEGIS introduces a novel orthogonal gradient projection method that preserves pre-trained vision-language model capabilities during fine-tuning for robotic control, avoiding catastrophic forgetting.
Contribution
It proposes a layer-wise orthogonal gradient projection framework that maintains the VQA manifold without requiring co-training data or replay buffers.
Findings
AEGIS effectively prevents catastrophic forgetting in vision-language models.
The method sheds less than 1% of gradient energy while eliminating activation drift.
AEGIS outperforms existing defenses by preserving model capabilities during fine-tuning.
Abstract
Adapting pre-trained vision-language models (VLMs) for robotic control requires injecting high-magnitude continuous gradients from a flow-matching action expert into a backbone trained exclusively with cross-entropy. This cross-modal gradient asymmetry - the spectral dimensionality mismatch between low-rank MSE regression gradients and the high-dimensional semantic manifold sculpted by CE pre-training, causes rapid, severe erosion of the VLM's visual-question-answering (VQA) capability. Industry-standard defences either sever the gradient pathway entirely via stop gradient, discarding the rich continuous supervision, or restrict parameter capacity through low-rank adapters (LoRA) that constrain the rank of updates but not their direction, and thus still overwrite the pre-trained manifold. We introduce AEGIS (Anchor-Enforced Gradient Isolation System): a buffer-free, layer-wise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
