AgentOCR: Reimagining Agent History via Optical Self-Compression

Lang Feng; Fuchao Yang; Feng Chen; Xin Cheng; Haiyang Xu; Zhenglin Wan; Ming Yan; Bo An

arXiv:2601.04786·cs.LG·March 3, 2026

AgentOCR: Reimagining Agent History via Optical Self-Compression

Lang Feng, Fuchao Yang, Feng Chen, Xin Cheng, Haiyang Xu, Zhenglin Wan, Ming Yan, Bo An

PDF

Open Access 1 Models

TL;DR

AgentOCR leverages visual tokens and self-compression to significantly reduce token usage in agentic systems, maintaining high performance while improving efficiency and scalability.

Contribution

This paper introduces a novel visual token-based history representation with segment optical caching and adaptive self-compression for scalable, efficient agent systems.

Findings

01

Preserves over 95% of text-based agent performance

02

Reduces token consumption by more than 50%

03

Achieves a 20x speedup in rendering

Abstract

Recent advances in large language models (LLMs) enable agentic systems trained with reinforcement learning (RL) over multi-turn interaction trajectories, but practical deployment is bottlenecked by rapidly growing textual histories that inflate token budgets and memory usage. We introduce AgentOCR, a framework that exploits the superior information density of visual tokens by representing the accumulated observation-action history as a compact rendered image. To make multi-turn rollouts scalable, AgentOCR proposes segment optical caching. By decomposing history into hashable segments and maintaining a visual cache, this mechanism eliminates redundant re-rendering. Beyond fixed rendering, AgentOCR introduces agentic self-compression, where the agent actively emits a compression rate and is trained with compression-aware reward to adaptively balance task success and token efficiency. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
zhangzhifang/verl-agent
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications