TL;DR
M-ORE is a modality-decoupled online recursive editing method for multimodal large language models that improves lifelong adaptation by mitigating cross-modal conflict and interference.
Contribution
It introduces a unified proximal-projection formulation with a closed-form update and Sherman-Morrison recursion for efficient, modality-decoupled online model editing.
Findings
Consistently improves reliability, generality, and locality over strong baselines.
Achieves favorable quality-efficiency scaling.
Effective across multiple MLLM backbones and benchmarks.
Abstract
Online model editing for multimodal large language models (MLLMs) requires assimilating a stream of corrections under tight compute and memory budgets. Yet editors developed for text-only LLMs often degrade on MLLMs: visually dominant activations skew the statistics that shape updates, causing cross-modal conflict, while sequential writes become entangled in a shared edit space and amplify long-horizon interference, causing inter-edit interference. To address these, we propose M-ORE, a modality-decoupled online recursive editor for lifelong MLLM adaptation. M-ORE is derived from a unified proximal-projection formulation and admits a closed-form update with a Sherman-Morrison recursion, yielding constant per-edit overhead. It maintains module-wise locality statistics for the text stack and the visual projector to avoid visually dominated update shaping and performs continual updates in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
