Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery
Renuka Chintalapati, Sid Raskar, Anurag Acharya, Jared Willard, Patrick Emami, Sameera Horawalavithana

TL;DR
This paper systematically compares eight memory condensation strategies for coding agents in scientific discovery tasks, revealing that no single method significantly improves hypothesis quality but some reduce token costs.
Contribution
It provides the first comprehensive evaluation of memory condensation strategies across multiple scientific domains using GPT-4o, guiding strategy selection.
Findings
LLM-based condensers increase token costs by 24-94%.
Masking tool-call outputs achieves 8.6% net savings.
Optimal condenser varies by domain and task length.
Abstract
Coding agents accumulate extensive context during long-running tasks, yet fixed context windows force practitioners to choose between truncation and task failure. While numerous memory condensation strategies have been proposed, from simple sliding windows to LLM-generated summaries, no systematic comparison exists to guide strategy selection, especially in scientific discovery tasks. We evaluate eight memory condensation strategies using GPT-4o on sixty DiscoveryBench tasks spanning six scientific domains (480 total evaluations). We find that no condenser significantly alters hypothesis quality, while LLM-based condensers increase token costs by 24-94 percent, and masking tool-call outputs achieves an 8.6 percent net savings. We also observe that the optimal condenser for data-driven scientific discovery varies by scientific domain and task length.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
