Dual Latent Memory for Visual Multi-agent System

Xinlei Yu; Chengming Xu; Zhangquan Chen; Bo Yin; Cheng Yang; Yongbo He; Yihao Hu; Jiangning Zhang; Cheng Tan; Xiaobin Hu; Shuicheng Yan

arXiv:2602.00471·cs.AI·February 3, 2026

Dual Latent Memory for Visual Multi-agent System

Xinlei Yu, Chengming Xu, Zhangquan Chen, Bo Yin, Cheng Yang, Yongbo He, Yihao Hu, Jiangning Zhang, Cheng Tan, Xiaobin Hu, Shuicheng Yan

PDF

Open Access

TL;DR

This paper introduces L$^{2}$-VMAS, a dual latent memory framework for visual multi-agent systems that improves scalability and efficiency by decoupling perception and thinking, and employing proactive memory access.

Contribution

It proposes a novel, model-agnostic dual latent memory architecture with entropy-driven triggering to enhance multi-agent collaboration and scalability.

Findings

01

Achieves 2.7-5.4% accuracy improvement

02

Reduces token usage by 21.3-44.8%

03

Effectively breaks the scaling wall in VMAS

Abstract

While Visual Multi-Agent Systems (VMAS) promise to enhance comprehensive abilities through inter-agent collaboration, empirical evidence reveals a counter-intuitive "scaling wall": increasing agent turns often degrades performance while exponentially inflating token costs. We attribute this failure to the information bottleneck inherent in text-centric communication, where converting perceptual and thinking trajectories into discrete natural language inevitably induces semantic loss. To this end, we propose L $^{2}$ -VMAS, a novel model-agnostic framework that enables inter-agent collaboration with dual latent memories. Furthermore, we decouple the perception and thinking while dynamically synthesizing dual latent memories. Additionally, we introduce an entropy-driven proactive triggering that replaces passive information transmission with efficient, on-demand memory access. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Human Motion and Animation