Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

Zhihan Liu; Lin Guan; Yixin Nie; Kai Zhang; Zhuoqun Hao; Lin Chen; Asli Celikyilmaz; Zhaoran Wang; Na Zhang

arXiv:2601.18217·cs.AI·January 27, 2026

Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents

Zhihan Liu, Lin Guan, Yixin Nie, Kai Zhang, Zhuoqun Hao, Lin Chen, Asli Celikyilmaz, Zhaoran Wang, Na Zhang

PDF

Open Access

TL;DR

This study investigates how properties of RL environments and modeling choices affect the out-of-domain generalization of LLM agents, revealing key factors like state information richness and planning complexity that influence robustness.

Contribution

It identifies environment axes influencing cross-domain generalization and proposes a simple randomization method to enhance robustness without altering the task.

Findings

01

State information richness correlates strongly with generalization.

02

Increasing state information alone improves cross-domain robustness.

03

Step-by-step thinking during RL preserves generalization.

Abstract

Generalist LLM agents are often post-trained on a narrow set of environments but deployed across far broader, unseen domains. In this work, we investigate the challenge of agentic post-training when the eventual test domains are unknown. Specifically, we analyze which properties of reinforcement learning (RL) environments and modeling choices have the greatest influence on out-of-domain performance. First, we identify two environment axes that strongly correlate with cross-domain generalization: (i) state information richness, i.e., the amount of information for the agent to process from the state, and (ii) planning complexity, estimated via goal reachability and trajectory length under a base policy. Notably, domain realism and text-level similarity are not the primary factors; for instance, the simple grid-world domain Sokoban leads to even stronger generalization in SciWorld than the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications