Self-Correction Distillation for Structured Data Question Answering

Yushan Zhu; Wen Zhang; Long Jin; Mengshu Sun; Ling Zhong; Zhiqiang Liu; Juan Li; Lei Liang; Chong Long; Chao Deng; Junlan Feng

arXiv:2511.07998·cs.CL·November 18, 2025

Self-Correction Distillation for Structured Data Question Answering

Yushan Zhu, Wen Zhang, Long Jin, Mengshu Sun, Ling Zhong, Zhiqiang Liu, Juan Li, Lei Liang, Chong Long, Chao Deng, Junlan Feng

PDF

Open Access 1 Video

TL;DR

This paper introduces a self-correction distillation method to enhance small-scale LLMs' ability to answer structured data questions, achieving near GPT-4 performance and surpassing existing methods.

Contribution

The paper proposes a novel self-correction distillation approach with an error prompt mechanism to improve small-scale LLMs' structured data QA capabilities.

Findings

01

SCD outperforms other distillation methods on 5 benchmarks.

02

SCD enables small LLMs to approach GPT-4 performance.

03

Large LLMs with EPM surpass state-of-the-art results.

Abstract

Structured data question answering (QA), including table QA, Knowledge Graph (KG) QA, and temporal KG QA, is a pivotal research area. Advances in large language models (LLMs) have driven significant progress in unified structural QA frameworks like TrustUQA. However, these frameworks face challenges when applied to small-scale LLMs since small-scale LLMs are prone to errors in generating structured queries. To improve the structured data QA ability of small-scale LLMs, we propose a self-correction distillation (SCD) method. In SCD, an error prompt mechanism (EPM) is designed to detect errors and provide customized error messages during inference, and a two-stage distillation strategy is designed to transfer large-scale LLMs' query-generation and error-correction capabilities to small-scale LLM. Experiments across 5 benchmarks with 3 structured data types demonstrate that our SCD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Self-Correction Distillation for Structured Data Question Answering· underline

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Machine Learning in Healthcare