Training a Generally Curious Agent
Fahim Tajwar, Yiding Jiang, Abitha Thankaraj, Sumaita Sadia Rahman, J Zico Kolter, Jeff Schneider, Ruslan Salakhutdinov

TL;DR
This paper introduces Paprika, a fine-tuning method that enables language models to develop general decision-making and exploration skills transferable to new tasks without further training.
Contribution
The paper presents Paprika, a novel fine-tuning approach that trains models on synthetic interaction data to enhance their ability to explore and adapt in unseen environments.
Findings
Models fine-tuned with Paprika transfer decision-making skills to new tasks.
Curriculum learning improves sample efficiency in training.
Paprika reduces reliance on gradient updates for adaptation.
Abstract
Efficient exploration is essential for intelligent systems interacting with their environment, but existing language models often fall short in scenarios that require strategic information gathering. In this paper, we present Paprika, a fine-tuning approach that enables language models to develop general decision-making capabilities that are not confined to particular environments. By training on synthetic interaction data from different tasks that require diverse strategies, Paprika teaches models to explore and adapt their behavior on a new task based on environment feedback in-context without more gradient updates. Experimental results show that models fine-tuned with Paprika can effectively transfer their learned decision-making capabilities to entirely unseen tasks without additional training. Unlike traditional training, our approach's primary bottleneck lies in sampling useful…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Games · AI in Service Interactions
