Confidence Scoring for LLM-Generated SQL in Supply Chain Data Extraction

Jiekai Ma; Yikai Zhao

arXiv:2506.17203·stat.AP·June 23, 2025

Confidence Scoring for LLM-Generated SQL in Supply Chain Data Extraction

Jiekai Ma, Yikai Zhao

PDF

TL;DR

This paper evaluates methods to estimate confidence in LLM-generated SQL queries for supply chain data, highlighting the limitations of self-reported confidence and the effectiveness of embedding-based similarity checks.

Contribution

It introduces and compares three approaches for confidence scoring in LLM-generated SQL, emphasizing the potential of embedding-based methods for accuracy assessment.

Findings

01

Embedding-based similarity effectively detects inaccurate SQL queries.

02

Self-reported confidence scores are often overconfident and unreliable.

03

Translation-based consistency checks show moderate effectiveness.

Abstract

Large Language Models (LLMs) have recently enabled natural language interfaces that translate user queries into executable SQL, offering a powerful solution for non-technical stakeholders to access structured data. However, one of the limitation that LLMs do not natively express uncertainty makes it difficult to assess the reliability of their generated queries. This paper presents a case study that evaluates multiple approaches to estimate confidence scores for LLM-generated SQL in supply chain data retrieval. We investigated three strategies: (1) translation-based consistency checks; (2) embedding-based semantic similarity between user questions and generated SQL; and (3) self-reported confidence scores directly produced by the LLM. Our findings reveal that LLMs are often overconfident in their own outputs, which limits the effectiveness of self-reported confidence. In contrast,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.