TL;DR
This paper introduces Meta-TTL, a framework that learns effective adaptation policies for language agents during test time through bi-level optimization, improving performance on various benchmarks.
Contribution
Meta-TTL formulates adaptation policy learning as a bi-level optimization problem guided by evolutionary search, enabling learned policies to outperform hand-crafted ones.
Findings
Meta-TTL outperforms hand-crafted baselines on Jericho and WebArena-Lite.
Learned adaptation policies generalize to out-of-distribution tasks.
Meta-TTL improves agent performance across multiple backbones.
Abstract
Test-Time Learning (TTL) enables language agents to iteratively refine their performance through repeated interactions with the environment at inference time. At the core of TTL is an adaptation policy that updates the actor policy based on experience from previous episodes, thereby improving future behavior. Existing methods rely on fixed, hand-crafted adaptation policies rather than optimizing them for downstream improvement. We argue that optimal adaptation policies should be learned from task environments, not hand-engineered based on human intuition. To achieve this, we introduce Meta-TTL, a framework that formulates the discovery of effective adaptation policies as a bi-level optimization problem. Within this framework, the inner loop executes the standard TTL process, measuring how effectively a candidate adaptation policy helps an agent correct errors across sequential episodes.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
