Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot   In-Context Learning for SQL2Text

Ali Al-Lawati; Jason Lucas; Prasenjit Mitra

arXiv:2501.03166·cs.CL·February 11, 2025

Semantic Captioning: Benchmark Dataset and Graph-Aware Few-Shot In-Context Learning for SQL2Text

Ali Al-Lawati, Jason Lucas, Prasenjit Mitra

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new benchmark dataset and a graph-aware in-context learning approach for translating SQL queries into natural language, improving explanation quality and dataset robustness for security and educational applications.

Contribution

It repurposes Text2SQL datasets for SQL2Text, employing an iterative GPT-4o-based prompt and graph-aware sample selection to enhance few-shot learning performance.

Findings

01

Graph-aware sample selection improves BLEU scores by up to 39%.

02

Using SQL's inherent graph properties outperforms random sampling.

03

The approach is effective with smaller, efficient LLMs.

Abstract

Large Language Models (LLMs) have demonstrated remarkable performance in various NLP tasks, including semantic parsing, which translates natural language into formal code representations. However, the reverse process, translating code into natural language, termed semantic captioning, has received less attention. This task is becoming increasingly important as LLMs are integrated into platforms for code generation, security analysis, and educational purposes. In this paper, we focus on the captioning of SQL query (SQL2Text) to address the critical need for understanding and explaining SQL queries in an era where LLM-generated code poses potential security risks. We repurpose Text2SQL datasets for SQL2Text by introducing an iterative ICL prompt using GPT-4o to generate multiple additional utterances, which enhances the robustness of the datasets for the reverse task. We conduct our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aliwister/ast-icl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsFocus