Computer Environments Elicit General Agentic Intelligence in LLMs

Daixuan Cheng; Shaohan Huang; Yuxian Gu; Huatong Song; Guoxin Chen; Li Dong; Wayne Xin Zhao; Ji-Rong Wen; Furu Wei

arXiv:2601.16206·cs.CL·April 9, 2026

Computer Environments Elicit General Agentic Intelligence in LLMs

Daixuan Cheng, Shaohan Huang, Yuxian Gu, Huatong Song, Guoxin Chen, Li Dong, Wayne Xin Zhao, Ji-Rong Wen, Furu Wei

PDF

1 Repo 7 Models 6 Datasets

TL;DR

This paper introduces LLM-in-Sandbox, a minimal computer environment that enhances large language models' general capabilities and efficiency without additional training, highlighting the environment's role in eliciting agentic intelligence.

Contribution

The study systematically investigates how simple computer environments can elicit general capabilities in LLMs and develops training methods to harness these interactions.

Findings

01

LLM-in-Sandbox improves performance across multiple domains by up to 15.5%.

02

Models reduce token consumption by up to 8 times in the sandbox environment.

03

Training with LLM-in-Sandbox-RL enables weaker models to utilize environmental interactions.

Abstract

Agentic intelligence in large language models (LLMs) requires not only model intrinsic capabilities but also interactions with external environments. Equipping LLMs with computers now represents a prevailing trend. However, the computer environment's intrinsic value has not been systematically investigated, particularly its potential to elicit general capabilities. Here we introduce LLM-in-Sandbox, which virtualizes the computer as a code sandbox with only basic functionalities, and demonstrate that this minimal setting elicits computer-based meta-capabilities for general task solving: external resource access, file management, and code execution. Without additional training, strong models achieve substantial gains (up to 15.5%) across mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following, while reducing token consumption by up to 8 times.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

llm-in-sandbox/llm-in-sandbox
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.