MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

Yaorui Shi; Shugui Liu; Yu Yang; Wenyu Mao; Yuxin Chen; Qi GU; Hui Su; Xunliang Cai; Xiang Wang; An Zhang

arXiv:2601.21468·cs.AI·May 19, 2026

MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning

Yaorui Shi, Shugui Liu, Yu Yang, Wenyu Mao, Yuxin Chen, Qi GU, Hui Su, Xunliang Cai, Xiang Wang, An Zhang

PDF

1 Repo 1 Models

TL;DR

MemOCR is a multimodal memory system that uses visual layout to adaptively compress and prioritize information, enabling efficient long-horizon reasoning within limited context windows.

Contribution

It introduces a structured visual memory that dynamically allocates information density based on layout, improving reasoning efficiency under tight context constraints.

Findings

01

Outperforms text-based baselines on long-context QA benchmarks.

02

Achieves more effective context utilization under extreme memory budgets.

03

Uses reinforcement learning for robust memory compression across varying budgets.

Abstract

Long-horizon agentic reasoning necessitates effectively compressing growing interaction histories into a limited context window. Most existing memory systems serialize history as text, where token-level cost is uniform and scales linearly with length, often spending scarce budget on low-value details. To this end, we introduce MemOCR, a multimodal memory agent that improves long-horizon reasoning under tight context budgets by allocating memory space with adaptive information density through visual layout. Concretely, MemOCR maintains a structured rich-text memory (e.g., headings, highlights) and renders it into an image that the agent consults for memory access, visually prioritizing crucial evidence while aggressively compressing auxiliary details. To ensure robustness across varying memory budgets, we train MemOCR with reinforcement learning under budget-aware objectives that expose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

meituan/MemOCR
github

Models

🤗
meituan/MemOCR-7B
model· 18 dl· ♡ 7
18 dl♡ 7

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Ferroelectric and Negative Capacitance Devices