Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP
Jann Pfeifer, Debayan Banerjee, Ricardo Usbeck

TL;DR
This paper explores using reinforcement learning with outcome-based rewards to train a small language model for zero-shot Text-to-SPARQL generation in the scholarly domain, avoiding the need for gold query annotations.
Contribution
It introduces a GRPO-based reinforcement learning approach that improves zero-shot Text-to-SPARQL performance without requiring full supervision or gold query data.
Findings
GRPO significantly outperforms the zero-shot baseline.
Execution feedback is the main contributor to performance gains.
Supervised finetuning yields higher accuracy but requires gold queries.
Abstract
Knowledge graph question answering seeks to translate natural language questions into executable queries over knowledge graphs, but existing approaches often rely on large models or full supervision in the form of gold query annotations. This study examines whether reinforcement learning with outcome-based rewards can train a small instruction-tuned language model to perform zero-shot Text-to-SPARQL generation in the scholarly domain. Group-Relative Policy Optimization (GRPO) is applied to the Qwen3-1.7B model on DBLP-QuAD, using prompts that combine natural language questions with symbolic hints about entities and relations. Training relies on execution feedback, structural constraints, and answer-level rewards, with an additional variant that incorporates gold-query-based shaping. The resulting models are compared to the unmodified zero-shot baseline and to a supervised DoRA-finetuned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
