FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table   Question Answering

Wei Zhou; Mohsen Mesgar; Heike Adel; Annemarie Friedrich

arXiv:2404.18585·cs.CL·April 30, 2024

FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering

Wei Zhou, Mohsen Mesgar, Heike Adel, Annemarie Friedrich

PDF

Open Access 2 Repos 8 Models 1 Video

TL;DR

This paper introduces a comprehensive benchmark for evaluating the robustness of Table Question Answering systems across structural, bias, and reasoning challenges, revealing current models' limitations.

Contribution

It formalizes three key robustness criteria for TQA and provides a new benchmark to evaluate and improve model resilience.

Findings

01

Current TQA models lack robustness across the three criteria.

02

None of the evaluated models consistently meet all robustness aspects.

03

The benchmark serves as a tool for future development of more robust TQA systems.

Abstract

Table Question Answering (TQA) aims at composing an answer to a question based on tabular data. While prior research has shown that TQA models lack robustness, understanding the underlying cause and nature of this issue remains predominantly unclear, posing a significant obstacle to the development of robust TQA systems. In this paper, we formalize three major desiderata for a fine-grained evaluation of robustness of TQA systems. They should (i) answer questions regardless of alterations in table structure, (ii) base their responses on the content of relevant cells rather than on biases, and (iii) demonstrate robust numerical reasoning capabilities. To investigate these aspects, we create and publish a novel TQA evaluation benchmark in English. Our extensive experimental analysis reveals that none of the examined state-of-the-art TQA systems consistently excels in these three aspects.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

FREB-TQA: A Fine-Grained Robustness Evaluation Benchmark for Table Question Answering· underline

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques

MethodsBalanced Selection