Frame-Guided Synthetic Claim Generation for Automatic Fact-Checking Using High-Volume Tabular Data
Jacob Devasier, Akshith Putta, Qing Wang, Alankrit Moses, Chengkai Li

TL;DR
This paper introduces a large-scale, multilingual dataset and a novel frame-guided method for generating synthetic claims from high-volume structured data to improve automated fact-checking.
Contribution
It presents a new dataset with complex OECD tables and a frame-guided approach for realistic claim generation, addressing a gap in fact-checking research on large-scale structured data.
Findings
LLMs have not memorized the facts, requiring genuine retrieval and reasoning.
The benchmark is highly challenging for current models.
Evidence retrieval is the main bottleneck in processing massive tables.
Abstract
Automated fact-checking benchmarks have largely ignored the challenge of verifying claims against real-world, high-volume structured data, instead focusing on small, curated tables. We introduce a new large-scale, multilingual dataset to address this critical gap. It contains 78,503 synthetic claims grounded in 434 complex OECD tables, which average over 500K rows each. We propose a novel, frame-guided methodology where algorithms programmatically select significant data points based on six semantic frames to generate realistic claims in English, Chinese, Spanish, and Hindi. Crucially, we demonstrate through knowledge-probing experiments that LLMs have not memorized these facts, forcing systems to perform genuine retrieval and reasoning rather than relying on parameterized knowledge. We provide a baseline SQL-generation system and show that our benchmark is highly challenging. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Computational and Text Analysis Methods · Topic Modeling
