Can RL Improve Generalization of LLM Agents? An Empirical Study

Zhiheng Xi; Xin Guo; Jiaqi Liu; Jiazheng Zhang; Yutao Fan; Zhihao Zhang; Shichun Liu; Mingxu Chai; Xiaowei Shi; Yitao Zhai; Xunliang Cai; Tao Gui; Qi Zhang; Xuanjing Huang

arXiv:2603.12011·cs.AI·March 13, 2026

Can RL Improve Generalization of LLM Agents? An Empirical Study

Zhiheng Xi, Xin Guo, Jiaqi Liu, Jiazheng Zhang, Yutao Fan, Zhihao Zhang, Shichun Liu, Mingxu Chai, Xiaowei Shi, Yitao Zhai, Xunliang Cai, Tao Gui, Qi Zhang, Xuanjing Huang

PDF

Open Access

TL;DR

This paper systematically evaluates reinforcement fine-tuning's ability to enhance the generalization of large language model agents across different environments and tasks, revealing strengths and limitations in transferability.

Contribution

It provides a comprehensive empirical analysis of RFT's generalization capabilities across environments, highlighting transfer challenges and proposing training strategies to improve robustness.

Findings

01

RFT generalizes well within environments across task difficulty

02

Transfer to unseen environments is weaker and depends on environment shifts

03

Sequential and mixture training improve transfer and reduce forgetting

Abstract

Reinforcement fine-tuning (RFT) has shown promise for training LLM agents to perform multi-turn decision-making based on environment feedback. However, most existing evaluations remain largely in-domain: training and testing are conducted in the same environment or even on the same tasks. In real-world deployment, agents may operate in unseen environments with different background knowledge, observation spaces, and action interfaces. To characterize the generalization profile of RFT under such shifts, we conduct a systematic study along three axes: (1) within-environment generalization across task difficulty, (2) cross-environment transfer to unseen environments, and (3) sequential multi-environment training to quantify transfer and forgetting. Our results show that RFT generalizes well across task difficulty within an environment, but exhibits weaker transfer to unseen environments,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Topic Modeling