OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

Jinze Li; Yang Zhang; Xin Yang; Jiayi Qu; Jinfeng Xu; Shuo Yang; Junhua Ding; Edith Cheuk-Han Ngai

arXiv:2604.26622·cs.CL·April 30, 2026

OCR-Memory: Optical Context Retrieval for Long-Horizon Agent Memory

Jinze Li, Yang Zhang, Xin Yang, Jiayi Qu, Jinfeng Xu, Shuo Yang, Junhua Ding, Edith Cheuk-Han Ngai

PDF

TL;DR

OCR-Memory introduces a visual-based memory system for long-horizon agents, enabling efficient retrieval of extended experiences with minimal prompt overhead by using optical encoding and visual anchors.

Contribution

It proposes a novel optical modality-based memory framework that enhances long-term experience retention and retrieval efficiency for autonomous agents.

Findings

01

Optical encoding increases effective memory capacity.

02

Retrieval via visual anchors reduces hallucination and preserves evidence.

03

Consistent performance gains on long-horizon benchmarks.

Abstract

Autonomous LLM agents increasingly operate in long-horizon, interactive settings where success depends on reusing experience accumulated over extended histories. However, existing agent memory systems are fundamentally constrained by text-context budgets: storing or revisiting raw trajectories is prohibitively token-expensive, while summarization and text-only retrieval trade token savings for information loss and fragmented evidence. To address this limitation, we propose Optical Context Retrieval Memory (OCR-Memory), a memory framework that leverages the visual modality as a high-density representation of agent experience, enabling retention of arbitrarily long histories with minimal prompt overhead at retrieval time. Specifically, OCR-Memory renders historical trajectories into images annotated with unique visual identifiers. OCR-Memory retrieves stored experience via a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.