SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs
Hyobin Park, Taeseop Kim, Dong-Geol Choi

TL;DR
SPARK introduces a self-play framework leveraging knowledge graphs from scientific literature to improve multi-hop relational reasoning in question answering tasks.
Contribution
It automatically constructs a knowledge graph from multi-document scientific texts and uses it for structured question generation and reward computation in self-play.
Findings
SPARK outperforms flat-corpus baselines on benchmarks.
Performance improves with increased hop count, indicating better multi-hop reasoning.
Structured KG grounding enhances relational reasoning beyond unstructured data.
Abstract
Self-play reinforcement learning has shown strong performance in domains with formally verifiable structure, such as mathematics and coding, where both problem generation and reward computation can be grounded in explicit rules. Extending this paradigm to scientific literature is more challenging: the relationships among multi-modal elements within and across documents are rarely made explicit in text, which makes automatic generation of relational reasoning questions difficult and weakens the reliability of reward signals. We propose SPARK (Self-Play with Asymmetric Reward from Knowledge Graphs), a framework that automatically constructs a unified knowledge graph (KG) from multi-document scientific literature and uses it as the structural basis for self-play. KG paths over multimodal nodes serve as a source for generating relational reasoning questions, and structured facts stored in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
