Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text
Ali Al-Lawati, Jason Lucas, Prasenjit Mitra

TL;DR
This paper introduces a new benchmark dataset and a graph-aware in-context learning approach for translating SQL queries into natural language, improving explanation quality and dataset robustness for security and educational applications.
Contribution
It repurposes Text2SQL datasets for SQL2Text, employing an iterative GPT-4o-based prompt and graph-aware sample selection to enhance few-shot learning performance.
Findings
Graph-aware sample selection improves BLEU scores by up to 39%.
Using SQL's inherent graph properties outperforms random sampling.
The approach is effective with smaller, efficient LLMs.
Abstract
Large Language Models (LLMs) have demonstrated remarkable performance in various NLP tasks, including semantic parsing, which translates natural language into formal code representations. However, the reverse process, translating code into natural language, termed semantic captioning, has received less attention. This task is becoming increasingly important as LLMs are integrated into platforms for code generation, security analysis, and educational purposes. In this paper, we focus on the captioning of SQL query (SQL2Text) to address the critical need for understanding and explaining SQL queries in an era where LLM-generated code poses potential security risks. We repurpose Text2SQL datasets for SQL2Text by introducing an iterative ICL prompt using GPT-4o to generate multiple additional utterances, which enhances the robustness of the datasets for the reverse task. We conduct our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsFocus
