KnowCoder-A1: Incentivizing Agentic Reasoning Capability with Outcome Supervision for KBQA
Zhuo Chen, Fei Wang, Zixuan Li, Zhao Zhang, Weiwei Ding, Chuanguang Yang, Yongjun Xu, Xiaolong Jin, Jiafeng Guo

TL;DR
KnowCoder-A1 is an LLM designed for autonomous agentic reasoning in KBQA, trained with outcome-only supervision and curriculum reinforcement learning, achieving superior zero-shot performance with less data.
Contribution
It introduces outcome-only supervision and multi-stage curriculum RL to enhance agentic reasoning in LLMs for KBQA tasks, reducing reliance on process supervision.
Findings
Outperforms prior methods on three KBQA datasets.
Achieves up to 11.1% relative improvement on GrailQA zero-shot subset.
Uses only one-twelfth of the training data compared to baselines.
Abstract
Knowledge Base Question Answering (KBQA) aims to answer natural-language questions over a structured Knowledge Base (KB). Recent work improves KBQA by adopting an agentic reasoning paradigm, in which Large Language Models (LLMs) iteratively decompose a question, generate its corresponding logical queries, and interact with the KB to derive the answer. However, these methods typically fine-tune LLMs on reasoning trajectories synthesized via process supervision, which offers weak incentives for exploration and thus fails to strengthen the agentic reasoning ability. In this paper, we propose KnowCoder-A1, an LLM that can autonomously perform agentic reasoning on KBs to obtain answers. To incentivize autonomous exploration, KnowCoder-A1 trains the LLM under outcome-only supervision via a multi-stage curriculum reinforcement learning with an easy-to-hard curriculum. To establish foundational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
