WIST: Web-Grounded Iterative Self-Play Tree for Domain-Targeted Reasoning Improvement

Fangyuan Li; Pengfei Li; Shijie Wang; Junqi Gao; Jianxing Liu; Biqing Qi; Yuqiang Li

arXiv:2603.22352·cs.LG·March 25, 2026

WIST: Web-Grounded Iterative Self-Play Tree for Domain-Targeted Reasoning Improvement

Fangyuan Li, Pengfei Li, Shijie Wang, Junqi Gao, Jianxing Liu, Biqing Qi, Yuqiang Li

PDF

Open Access

TL;DR

WIST is a web-grounded iterative self-play framework that enhances domain-specific reasoning in language models by learning directly from the open web, avoiding curated datasets and improving model performance.

Contribution

WIST introduces a novel web-grounded iterative self-play tree method that learns from the open web, enabling domain-targeted reasoning without curated data environments.

Findings

01

WIST improves model performance by up to 9.8 points across four backbones.

02

WIST outperforms endogenous self-evolution and corpus-grounded baselines.

03

WIST significantly enhances domain-specific reasoning, e.g., +14.79 in medicine.

Abstract

Recent progress in reinforcement learning with verifiable rewards (RLVR) offers a practical path to self-improvement of language models, but existing methods face a key trade-off: endogenous self-play can drift over iterations, while corpus-grounded approaches rely on curated data environments. We present \textbf{WIST}, a \textbf{W}eb-grounded \textbf{I}terative \textbf{S}elf-play \textbf{T}ree framework for domain-targeted reasoning improvement that learns directly from the open web without requiring any pre-arranged domain corpus. WIST incrementally expands a domain tree for exploration, and retrieves and cleans path-consistent web corpus to construct a controllable training environment. It then performs Challenger--Solver self-play with verifiable rewards, and feeds learnability signals back to update node posteriors and guide subsequent exploration through an adaptive curriculum.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics