Data-aware candidate selection in NL2SQL translation via small separating instances

Stanislav Kikot; Alexander Shulgin; Yanwei Xu

arXiv:2605.12319·cs.DB·May 13, 2026

Data-aware candidate selection in NL2SQL translation via small separating instances

Stanislav Kikot, Alexander Shulgin, Yanwei Xu

PDF

1 Repo

TL;DR

This paper introduces a data-aware candidate selection method for NL2SQL translation that leverages separating instances and provenance, demonstrating significant improvements over baselines with minimal candidates.

Contribution

The authors present a novel candidate selection approach using separating instances and provenance, outperforming traditional methods in low-candidate scenarios.

Findings

01

Our method significantly outperforms baselines with only two or three candidates.

02

It shows strong performance without relying on a consistency score.

03

The approach is effective on a subset of BIRD-DEV.

Abstract

We propose a data-aware candidate selection method for NL2SQL translation based on separating instances and provenance. We implement this approach and evaluate it against three natural baselines on a subset of BIRD-DEV. Experiments show that our method significantly outperforms baselines when only two or three candidates are given and no consistency score is available. The code of our prototype can be found at https://github.com/staskikotx/SISelection

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

staskikotx/SISelection
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.