TANQ: An open domain dataset of table answered questions
Mubashara Akhtar, Chenxi Pang, Andreea Marzoca, Yasemin Altun, and Julian Martin Eisenschlos

TL;DR
TANQ introduces a novel open domain dataset for question answering that requires constructing tables from multiple sources, with detailed source attribution and benchmarking of current language models' performance.
Contribution
The paper presents TANQ, the first dataset for table-based question answering across multiple sources, and provides benchmark results and analysis of model capabilities and limitations.
Findings
Best model achieves 60.7 F1 score, 12.3 points below human performance.
Models struggle with multi-hop reasoning, math, and unit conversions.
TANQ reveals significant challenges in complex table-based question answering.
Abstract
Language models, potentially augmented with tool usage such as retrieval are becoming the go-to means of answering questions. Understanding and answering questions in real-world settings often requires retrieving information from different sources, processing and aggregating data to extract insights, and presenting complex findings in form of structured artifacts such as novel tables, charts, or infographics. In this paper, we introduce TANQ, the first open domain question answering dataset where the answers require building tables from information across multiple sources. We release the full source attribution for every cell in the resulting table and benchmark state-of-the-art language models in open, oracle, and closed book setups. Our best-performing baseline, Gemini Flash reaches an overall F1 score of 60.7, lagging behind human performance by 12.3 points. We analyse baselines'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
