In-Context Reinforcement Learning for Tool Use in Large Language Models
Yaoqi Ye, Yiran Zhao, Keyu Duan, Zeyu Zheng, Kenji Kawaguchi, Cihang Xie, Michael Qizhe Shieh

TL;DR
This paper introduces In-Context Reinforcement Learning (ICRL), a novel RL-only approach that enables large language models to learn tool use without supervised fine-tuning, improving reasoning and factual retrieval capabilities efficiently.
Contribution
ICRL eliminates the need for supervised fine-tuning by using few-shot prompts during RL, gradually reducing in-context examples to enable zero-shot tool invocation in large language models.
Findings
ICRL achieves state-of-the-art results on reasoning benchmarks.
ICRL is more data-efficient than traditional fine-tuning methods.
Models trained with ICRL can call external tools independently in zero-shot settings.
Abstract
While large language models (LLMs) exhibit strong reasoning abilities, their performance on complex tasks is often constrained by the limitations of their internal knowledge. A compelling approach to overcome this challenge is to augment these models with external tools -- such as Python interpreters for mathematical computations or search engines for retrieving factual information. However, enabling models to use these tools effectively remains a significant challenge. Existing methods typically rely on cold-start pipelines that begin with supervised fine-tuning (SFT), followed by reinforcement learning (RL). These approaches often require substantial amounts of labeled data for SFT, which is expensive to annotate or synthesize. In this work, we propose In-Context Reinforcement Learning (ICRL), an RL-only framework that eliminates the need for SFT by leveraging few-shot prompting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
