Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation
Peter Baile Chen, Yi Zhang, Dan Roth, Samuel Madden, Jacob Andreas, Michael Cafarella

TL;DR
This paper introduces log-augmented generation (LAG), a framework that reuses prior reasoning logs at test time to improve large language models' performance on new tasks, enhancing accuracy without sacrificing efficiency.
Contribution
LAG is a novel approach that directly reuses prior reasoning and computation logs for test-time reasoning, surpassing existing memory and caching methods in accuracy.
Findings
LAG significantly outperforms standard systems on reasoning-intensive datasets.
Reusing reasoning logs improves model accuracy without additional training.
The method maintains efficiency while enhancing performance.
Abstract
While humans naturally learn and adapt from past experiences, large language models (LLMs) and their agentic counterparts struggle to retain reasoning from previous tasks and apply them in future contexts. To address this limitation, we propose a novel framework, log-augmented generation (LAG) that directly reuses prior computation and reasoning from past logs at test time to enhance model's ability to learn from previous tasks and perform better on new, unseen challenges, all while keeping the system efficient and scalable. Specifically, our system represents task logs using key-value (KV) caches, encoding the full reasoning context of prior tasks while storing KV caches for only a selected subset of tokens. When a new task arises, LAG retrieves the KV values from relevant logs to augment generation. Our approach differs from reflection-based memory mechanisms by directly reusing prior…
Peer Reviews
Decision·ICLR 2026 Poster
* Use of retrieval for reasoning is very nice and the idea of using intermediate KV cache representations. This is a very nice idea and I believe an important area of study as it helps to decouple knowledge representation and reasoning representations. * The empirical successes of the paper especially using the KV cache values rather than full text representation of logs motivates future exploration of a wide variety of topics in retrieval augmented methods.
* **Scaling with Context Length**: The KV cache vs text result is most interesting I believe, but also under explored. It is of course related to context length limitations, but it is hard to know exactly how the context size of the model changes this (e.g., Llama 8B's performance: https://arxiv.org/pdf/2504.06214v1). * **Depth of contribution**: It's not clear to me how to evaluate the novelty of the contribution here. I understand the novelty of retrieving logs, but it is not a sea-change fr
1. **Clear and Motivated Problem Setting:** The paper identifies a practical and under-explored challenge—enabling LLMs to effectively reuse prior computation at test-time—mirroring a natural aspect of human reasoning. The distinction between reusing prior reasoning and simply increasing context length is well argued, particularly in the early discussion and Figure 1. 2. **Novel Use of** **KV** **Cache Representations:** Unlike existing KV cache approaches that target efficiency, LAG innovativel
1. **Related Work Positioning—Missing Direct Recent Papers:** The paper omits discussion of several key recent works that closely align with LAG's aims of scaling test-time computation and effective reuse for retrieval-augmented generation. Relevant missing works include: 1. *Yue et al. (2025): Inference Scaling for Long-Context Retrieval Augmented Generation*. 2. *Geiping et al. (2025): Scaling up Test-Time Compute with Latent Reasoning The absence of discussion or comparison w
**Novel Use of KV Cache for Reasoning:** The paper's primary contribution is the repurposing of KV caching from a tool for computational efficiency to a mechanism for improving reasoning and accuracy by reusing past computations. This presents a compelling and conceptually distinct alternative to reflection-based memory systems. *Effective Encoding Mechanism:** The technical approach of encoding the full reasoning history while only storing the KV values for the last model response is a clever
**Static Log Store Evaluation:** The experiments are conducted using a static log store that is built offline. This is a simplified setting that sidesteps critical challenges of a true lifelong learning system, such as the computational cost of retrieval in a massive and ever-growing log store, and the long-term effects of noise accumulation. **Insufficient Analysis of Error Propagation:** The log store is intentionally populated without filtering for correctness to simulate a realistic scenari
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Explainable Artificial Intelligence (XAI)
