Language-based Trial and Error Falls Behind in the Era of Experience

Haoyu Wang; Guozheng Ma; Shugang Cui; Yilun Kong; Haotian Luo; Li Shen; Mengya Gao; Yichao Wu; Xiaogang Wang; Dacheng Tao

arXiv:2601.21754·cs.AI·February 3, 2026

Language-based Trial and Error Falls Behind in the Era of Experience

Haoyu Wang, Guozheng Ma, Shugang Cui, Yilun Kong, Haotian Luo, Li Shen, Mengya Gao, Yichao Wu, Xiaogang Wang, Dacheng Tao

PDF

Open Access 1 Models

TL;DR

This paper introduces SCOUT, a framework that uses lightweight probes to efficiently explore environments and fine-tune large language models, significantly improving their performance on unseen, nonlinguistic tasks while reducing computational costs.

Contribution

The paper presents SCOUT, a novel exploration framework that decouples exploration from exploitation, enabling large language models to better handle unseen tasks with less computational expense.

Findings

01

SCOUT improves LLM performance on unseen tasks.

02

SCOUT reduces GPU hours by about 60%.

03

Qwen2.5-3B-Instruct achieves 0.86 score with SCOUT.

Abstract

While Large Language Models (LLMs) excel in language-based agentic tasks, their applicability to unseen, nonlinguistic environments (e.g., symbolic or spatial tasks) remains limited. Previous work attributes this performance gap to the mismatch between the pretraining distribution and the testing distribution. In this work, we demonstrate the primary bottleneck is the prohibitive cost of exploration: mastering these tasks requires extensive trial-and-error, which is computationally unsustainable for parameter-heavy LLMs operating in a high dimensional semantic space. To address this, we propose SCOUT (Sub-Scale Collaboration On Unseen Tasks), a novel framework that decouples exploration from exploitation. We employ lightweight "scouts" (e.g., small MLPs) to probe environmental dynamics at a speed and scale far exceeding LLMs. The collected trajectories are utilized to bootstrap the LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Harryis/SCOUT_multitask
model· 3 dl· ♡ 2
3 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling