Test-Time Learning with an Evolving Library
Weijia Xu, Alessandro Sordoni, Chandan Singh, Zelalem Gero, Michel Galley, Xingdi Yuan, Jianfeng Gao

TL;DR
EvoLib is a test-time learning framework for large language models that builds and refines a shared knowledge library without parameter updates, improving performance across various tasks.
Contribution
It introduces a novel library-based approach for test-time learning that evolves knowledge abstractions without external supervision or parameter updates.
Findings
EvoLib improves performance on mathematical reasoning benchmarks.
EvoLib enhances code generation and multi-turn reasoning tasks.
The framework outperforms existing test-time learning methods without ground-truth feedback.
Abstract
We introduce EvoLib, a test-time learning framework that enables large language models to accumulate, reuse, and evolve knowledge across problem instances without parameter updates or external supervision. Instead of adapting model parameters, our approach maintains a shared library of knowledge abstractions, including modular skills and reflective insights, automatically extracted from the model's own inference trajectories. To support continual improvement, we introduce a principled weighting and consolidation mechanism that jointly optimizes for immediate utility and long-term value. This allows simple, instance-specific abstractions to evolve into more general and reusable ones over time. Across challenging benchmarks in mathematical reasoning, code generation, and multi-turn agentic environments, EvoLib improves substantially over the top test-time scaling and learning methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
