Reuse, Don't Recompute: Efficient Large Reasoning Model Inference via Memory Orchestration

Daivik Patel; Shrenik Patel

arXiv:2511.12987·cs.MA·March 4, 2026

Reuse, Don't Recompute: Efficient Large Reasoning Model Inference via Memory Orchestration

Daivik Patel, Shrenik Patel

PDF

Open Access

TL;DR

ENGRAM-R introduces a memory layer for large reasoning models that significantly reduces token usage and latency by reusing structured memory, maintaining high accuracy in reasoning tasks.

Contribution

The paper presents ENGRAM-R, a novel memory-based inference method that improves efficiency and accuracy in large reasoning models by integrating typed retrieval and compact fact representations.

Findings

01

Reduces input tokens by 85% and reasoning tokens by 75% on LoCoMo.

02

Achieves similar efficiency with accuracy gains on LongMemEval.

03

Demonstrates memory's role in efficient, long-horizon reasoning.

Abstract

Large reasoning models (LRMs) achieve strong accuracy through test-time scaling, generating longer chains of thought or sampling multiple solutions, but at steep costs in tokens and latency. We argue that memory is a core ingredient for efficient reasoning: when evidence already exists, models should think less by reusing structured memory instead of recomputing derivations. We present ENGRAM-R, an inference-time memory layer that integrates typed retrieval with compact fact card representations and explicit citation control. On the LoCoMo benchmark, ENGRAM-R reduces input tokens by 85% and reasoning tokens by 75% compared to full context while maintaining high accuracy. On a multi-hop slice of the LongMemEval benchmark, it achieves similar efficiency with substantial accuracy gains. These results show that memory is not only critical for long-horizon correctness but also a practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks