Cross-Modal Memory Compression for Efficient Multi-Agent Debate

Jing Wu; Yue Sun; Tianpei Xie; Suiyao Chen; Jingyuan Bao; Yaopengxiao Xu; Gaoyuan Du; Inseok Heo; Alexander Gutfraind; Xin Wang

arXiv:2602.00454·cs.AI·February 3, 2026

Cross-Modal Memory Compression for Efficient Multi-Agent Debate

Jing Wu, Yue Sun, Tianpei Xie, Suiyao Chen, Jingyuan Bao, Yaopengxiao Xu, Gaoyuan Du, Inseok Heo, Alexander Gutfraind, Xin Wang

PDF

Open Access

TL;DR

This paper presents DebateOCR, a cross-modal memory compression method that replaces lengthy textual debate histories with compact image representations, significantly reducing token usage and computational costs while maintaining reasoning quality.

Contribution

Introduction of DebateOCR, a novel cross-modal compression framework that replaces textual debate histories with image-based summaries to improve efficiency in multi-agent debate systems.

Findings

01

Reduces input tokens by over 92%

02

Lowers compute cost and speeds up inference

03

Supports recovery of omitted information through multi-agent diversity

Abstract

Multi-agent debate can improve reasoning quality and reduce hallucinations, but it incurs rapidly growing context as debate rounds and agent count increase. Retaining full textual histories leads to token usage that can exceed context limits and often requires repeated summarization, adding overhead and compounding information loss. We introduce DebateOCR, a cross-modal compression framework that replaces long textual debate traces with compact image representations, which are then consumed through a dedicated vision encoder to condition subsequent rounds. This design compresses histories that commonly span tens to hundreds of thousands of tokens, cutting input tokens by more than 92% and yielding substantially lower compute cost and faster inference across multiple benchmarks. We further provide a theoretical perspective showing that diversity across agents supports recovery of omitted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Generative Adversarial Networks and Image Synthesis