Weak-to-Strong Generalization with Failure Trajectories: A Tree-based Approach to Elicit Optimal Policy in Strong Models

Ruimeng Ye; Zihan Wang; Yang Xiao; Zinan Ling; Manling Li; Bo Hui

arXiv:2507.18858·cs.LG·March 10, 2026

Weak-to-Strong Generalization with Failure Trajectories: A Tree-based Approach to Elicit Optimal Policy in Strong Models

Ruimeng Ye, Zihan Wang, Yang Xiao, Zinan Ling, Manling Li, Bo Hui

PDF

TL;DR

This paper introduces a tree-based method for weak-to-strong generalization in complex decision environments, leveraging failure trajectories and Monte Carlo Tree Search to enhance strong model capabilities.

Contribution

It extends W2SG to complex interactive tasks by incorporating failure experiences and hierarchical trajectory trees with MCTS for optimized policy learning.

Findings

01

Significant performance improvements in reasoning tasks

02

Effective learning from failure trajectories

03

Robust scalability across diverse domains

Abstract

Weak-to-Strong generalization (W2SG) is a new trend to elicit the full capabilities of a strong model with supervision from a weak model. While existing W2SG studies focus on simple tasks like binary classification, we extend this paradigm to complex interactive decision-making environments. Specifically, we fine-tune a strong model with trajectories of intermediate actions generated by a weak model. Motivated by the human learning process, we propose to generalize not only success knowledge but also failure experience so that the strong model can learn from failed trajectories accumulated by weak models. To effectively and efficiently elicit the potential of strong agents, we further construct ``trajectory trees," a hierarchical representation that organizes weak model-generated action trajectories, coupled with Monte Carlo Tree Search (MCTS) to optimize the strong model. Through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.