Computer Environments Elicit General Agentic Intelligence in LLMs
Daixuan Cheng, Shaohan Huang, Yuxian Gu, Huatong Song, Guoxin Chen, Li Dong, Wayne Xin Zhao, Ji-Rong Wen, Furu Wei

TL;DR
This paper introduces LLM-in-Sandbox, a minimal computer environment that enhances large language models' general capabilities and efficiency without additional training, highlighting the environment's role in eliciting agentic intelligence.
Contribution
The study systematically investigates how simple computer environments can elicit general capabilities in LLMs and develops training methods to harness these interactions.
Findings
LLM-in-Sandbox improves performance across multiple domains by up to 15.5%.
Models reduce token consumption by up to 8 times in the sandbox environment.
Training with LLM-in-Sandbox-RL enables weaker models to utilize environmental interactions.
Abstract
Agentic intelligence in large language models (LLMs) requires not only model intrinsic capabilities but also interactions with external environments. Equipping LLMs with computers now represents a prevailing trend. However, the computer environment's intrinsic value has not been systematically investigated, particularly its potential to elicit general capabilities. Here we introduce LLM-in-Sandbox, which virtualizes the computer as a code sandbox with only basic functionalities, and demonstrate that this minimal setting elicits computer-based meta-capabilities for general task solving: external resource access, file management, and code execution. Without additional training, strong models achieve substantial gains (up to 15.5%) across mathematics, physics, chemistry, biomedicine, long-context understanding, and instruction following, while reducing token consumption by up to 8 times.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗instruction-pretrain/instruction-synthesizermodel· 26 dl· ♡ 7926 dl♡ 79
- 🤗instruction-pretrain/finance-Llama3-8Bmodel· 1.8k dl· ♡ 751.8k dl♡ 75
- 🤗instruction-pretrain/InstructLM-500Mmodel· 1.5k dl· ♡ 381.5k dl♡ 38
- 🤗instruction-pretrain/InstructLM-1.3Bmodel· 28 dl· ♡ 4328 dl♡ 43
- 🤗instruction-pretrain/medicine-Llama3-8Bmodel· 58 dl· ♡ 3858 dl♡ 38
- 🤗daixuancheng/Qwen3-4B-Instruct-2507-LLM-in-Sandbox-RLmodel· 42 dl· ♡ 142 dl♡ 1
- 🤗Nitish-Garikoti/finance-Llama3-8Bmodel· 6 dl6 dl
- instruction-pretrain/ft-instruction-synthesizer-collectiondataset· 874 dl874 dl
- instruction-pretrain/medicine-instruction-augmented-corporadataset· 2.1k dl2.1k dl
- instruction-pretrain/general-instruction-augmented-corporadataset· 13k dl13k dl
- daixuancheng/llm-in-sandbox-benchdataset· 620 dl620 dl
- daixuancheng/llm-in-sandbox-rldataset· 495 dl495 dl
- zhongweixie/A-Survey-on-AI-Agent-Harnessdataset· 28 dl28 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
