WIST: Web-Grounded Iterative Self-Play Tree for Domain-Targeted Reasoning Improvement
Fangyuan Li, Pengfei Li, Shijie Wang, Junqi Gao, Jianxing Liu, Biqing Qi, Yuqiang Li

TL;DR
WIST is a web-grounded iterative self-play framework that enhances domain-specific reasoning in language models by learning directly from the open web, avoiding curated datasets and improving model performance.
Contribution
WIST introduces a novel web-grounded iterative self-play tree method that learns from the open web, enabling domain-targeted reasoning without curated data environments.
Findings
WIST improves model performance by up to 9.8 points across four backbones.
WIST outperforms endogenous self-evolution and corpus-grounded baselines.
WIST significantly enhances domain-specific reasoning, e.g., +14.79 in medicine.
Abstract
Recent progress in reinforcement learning with verifiable rewards (RLVR) offers a practical path to self-improvement of language models, but existing methods face a key trade-off: endogenous self-play can drift over iterations, while corpus-grounded approaches rely on curated data environments. We present \textbf{WIST}, a \textbf{W}eb-grounded \textbf{I}terative \textbf{S}elf-play \textbf{T}ree framework for domain-targeted reasoning improvement that learns directly from the open web without requiring any pre-arranged domain corpus. WIST incrementally expands a domain tree for exploration, and retrieves and cleans path-consistent web corpus to construct a controllable training environment. It then performs Challenger--Solver self-play with verifiable rewards, and feeds learnability signals back to update node posteriors and guide subsequent exploration through an adaptive curriculum.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
