Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models

Yuehao Liu; Shanyan Guan; Weijia Zhang; Xuanming Shang; Yanhao Ge; Wei Li; Chao Ma

arXiv:2605.14938·cs.LG·May 15, 2026

Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models

Yuehao Liu, Shanyan Guan, Weijia Zhang, Xuanming Shang, Yanhao Ge, Wei Li, Chao Ma

PDF

TL;DR

Octopus introduces a history-free gradient orthogonalization method for continual learning in multimodal large language models, effectively balancing knowledge acquisition and forgetting without storing past data.

Contribution

It proposes a novel two-stage finetuning framework, HiFGO, that enforces gradient orthogonality without historical data, improving continual learning performance.

Findings

01

Octopus surpasses previous SOTA by 2.14% in Avg performance.

02

It achieves a 6.82% improvement in Last performance.

03

The method effectively balances plasticity and stability in continual learning.

Abstract

Continual learning in multimodal large language models (MLLMs) aims to sequentially acquire knowledge while mitigating catastrophic forgetting, yet existing methods face inherent limitations: architecture-based approaches incur additional computational overhead and often generalize poorly to new tasks, rehearsal-based methods rely on storing historical data, raising privacy and storage concerns, and conventional regularization-based strategies alone are insufficient to fully prevent parameter interference. We propose Octopus, a two-stage continual learning framework based on History-Free Gradient Orthogonalization (HiFGO), which enforces gradient-level orthogonality without historical task data. Our proposed two-stage finetuning strategy decouples task adaptation from regularization, achieving a principled balance between plasticity and stability. Experiments on UCIT show that Octopus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.