SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs

Hyobin Park; Taeseop Kim; Dong-Geol Choi

arXiv:2605.05546·cs.AI·May 8, 2026

SPARK: Self-Play with Asymmetric Reward from Knowledge Graphs

Hyobin Park, Taeseop Kim, Dong-Geol Choi

PDF

TL;DR

SPARK introduces a self-play framework leveraging knowledge graphs from scientific literature to improve multi-hop relational reasoning in question answering tasks.

Contribution

It automatically constructs a knowledge graph from multi-document scientific texts and uses it for structured question generation and reward computation in self-play.

Findings

01

SPARK outperforms flat-corpus baselines on benchmarks.

02

Performance improves with increased hop count, indicating better multi-hop reasoning.

03

Structured KG grounding enhances relational reasoning beyond unstructured data.

Abstract

Self-play reinforcement learning has shown strong performance in domains with formally verifiable structure, such as mathematics and coding, where both problem generation and reward computation can be grounded in explicit rules. Extending this paradigm to scientific literature is more challenging: the relationships among multi-modal elements within and across documents are rarely made explicit in text, which makes automatic generation of relational reasoning questions difficult and weakens the reliability of reward signals. We propose SPARK (Self-Play with Asymmetric Reward from Knowledge Graphs), a framework that automatically constructs a unified knowledge graph (KG) from multi-document scientific literature and uses it as the structural basis for self-play. KG paths over multimodal nodes serve as a source for generating relational reasoning questions, and structured facts stored in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.