Knowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agents
Huyu Wu, Jun Liu, Xiaochi Wei, Yan Gao, Yi Wu, and Yao Hu

TL;DR
This paper enhances self-evolving search agents by using knowledge-graph paths as intermediate supervision, improving question construction and reward signals, leading to better performance on multiple QA benchmarks.
Contribution
It introduces a method that reuses knowledge-graph paths for question construction and reward shaping, addressing key bottlenecks in Search Self-Play frameworks.
Findings
Improved average scores across seven QA benchmarks.
Notable gains on multi-hop QA tasks.
Effective use of knowledge-graph paths as lightweight supervision.
Abstract
Self-evolving search agents reduce reliance on human-written training questions by generating and solving their own search tasks. We build on Search Self-Play (SSP), a representative Proposer and Solver framework in which questions are generated and answered via multi-step search and reasoning. In practice, however, SSP faces two bottlenecks: the Proposer constructs questions from isolated answer entities without relational context, yielding many invalid or unverifiable questions in early self-play training, while the Solver receives only a binary outcome reward that discards useful signal from partially on-track search trajectories. We address both bottlenecks by reusing knowledge-graph paths as construction-derived intermediate supervision for both question construction and reward shaping. First, we ground question construction in LLM-guided knowledge-graph subgraphs, providing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
