RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models

Dario Satriani; Enzo Veltri; Donatello Santoro; Paolo Papotti

arXiv:2505.21409·cs.CL·May 28, 2025

RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models

Dario Satriani, Enzo Veltri, Donatello Santoro, Paolo Papotti

PDF

Open Access

TL;DR

This paper introduces RelationalFactQA, a benchmark for evaluating large language models' ability to retrieve and generate structured, multi-record tabular factual data, revealing significant current limitations.

Contribution

We present a new benchmark, RelationalFactQA, designed to evaluate the structured fact retrieval capabilities of LLMs, highlighting their struggles with complex, multi-record outputs.

Findings

01

State-of-the-art LLMs achieve less than 25% accuracy on relational outputs.

02

Performance declines as output size and complexity increase.

03

Current models exhibit significant failure modes in structured factual generation.

Abstract

Factuality in Large Language Models (LLMs) is a persistent challenge. Current benchmarks often assess short factual answers, overlooking the critical ability to generate structured, multi-record tabular outputs from parametric knowledge. We demonstrate that this relational fact retrieval is substantially more difficult than isolated point-wise queries, even when individual facts are known to the model, exposing distinct failure modes sensitive to output dimensionality (e.g., number of attributes or records). To systematically evaluate this under-explored capability, we introduce RelationalFactQA, a new benchmark featuring diverse natural language questions (paired with SQL) and gold-standard tabular answers, specifically designed to assess knowledge retrieval in a structured format. RelationalFactQA enables analysis across varying query complexities, output sizes, and data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Advanced Text Analysis Techniques