KnowCoder-A1: Incentivizing Agentic Reasoning Capability with Outcome Supervision for KBQA

Zhuo Chen; Fei Wang; Zixuan Li; Zhao Zhang; Weiwei Ding; Chuanguang Yang; Yongjun Xu; Xiaolong Jin; Jiafeng Guo

arXiv:2510.25101·cs.AI·November 19, 2025

KnowCoder-A1: Incentivizing Agentic Reasoning Capability with Outcome Supervision for KBQA

Zhuo Chen, Fei Wang, Zixuan Li, Zhao Zhang, Weiwei Ding, Chuanguang Yang, Yongjun Xu, Xiaolong Jin, Jiafeng Guo

PDF

TL;DR

KnowCoder-A1 is an LLM designed for autonomous agentic reasoning in KBQA, trained with outcome-only supervision and curriculum reinforcement learning, achieving superior zero-shot performance with less data.

Contribution

It introduces outcome-only supervision and multi-stage curriculum RL to enhance agentic reasoning in LLMs for KBQA tasks, reducing reliance on process supervision.

Findings

01

Outperforms prior methods on three KBQA datasets.

02

Achieves up to 11.1% relative improvement on GrailQA zero-shot subset.

03

Uses only one-twelfth of the training data compared to baselines.

Abstract

Knowledge Base Question Answering (KBQA) aims to answer natural-language questions over a structured Knowledge Base (KB). Recent work improves KBQA by adopting an agentic reasoning paradigm, in which Large Language Models (LLMs) iteratively decompose a question, generate its corresponding logical queries, and interact with the KB to derive the answer. However, these methods typically fine-tune LLMs on reasoning trajectories synthesized via process supervision, which offers weak incentives for exploration and thus fails to strengthen the agentic reasoning ability. In this paper, we propose KnowCoder-A1, an LLM that can autonomously perform agentic reasoning on KBs to obtain answers. To incentivize autonomous exploration, KnowCoder-A1 trains the LLM under outcome-only supervision via a multi-stage curriculum reinforcement learning with an easy-to-hard curriculum. To establish foundational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.