Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs

Jyotika Singh; Weiyi Sun; Amit Agarwal; Viji Krishnamurthy; Yassine Benajiba; Sujith Ravi; Dan Roth

arXiv:2510.23854·cs.CL·October 29, 2025

Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs

Jyotika Singh, Weiyi Sun, Amit Agarwal, Viji Krishnamurthy, Yassine Benajiba, Sujith Ravi, Dan Roth

PDF

TL;DR

This paper introduces Combo-Eval, an evaluation framework for assessing LLM-generated natural language representations of tabular data, along with a new dataset, NLR-BIRD, to benchmark these representations.

Contribution

It presents a novel evaluation method combining multiple existing approaches to improve fidelity and reduce LLM calls, and introduces the first dedicated dataset for NLR benchmarking.

Findings

01

Combo-Eval reduces LLM calls by 25-61%.

02

Combo-Eval aligns well with human judgments.

03

NLR-BIRD enables effective benchmarking of NLR quality.

Abstract

In modern industry systems like multi-turn chat agents, Text-to-SQL technology bridges natural language (NL) questions and database (DB) querying. The conversion of tabular DB results into NL representations (NLRs) enables the chat-based interaction. Currently, NLR generation is typically handled by large language models (LLMs), but information loss or errors in presenting tabular results in NL remains largely unexplored. This paper introduces a novel evaluation method - Combo-Eval - for judgment of LLM-generated NLRs that combines the benefits of multiple existing methods, optimizing evaluation fidelity and achieving a significant reduction in LLM calls by 25-61%. Accompanying our method is NLR-BIRD, the first dedicated dataset for NLR benchmarking. Through human evaluations, we demonstrate the superior alignment of Combo-Eval with human judgments, applicable across scenarios with and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.