TACT: Advancing Complex Aggregative Reasoning with Information   Extraction Tools

Avi Caciularu; Alon Jacovi; Eyal Ben-David; Sasha Goldshtein; Tal; Schuster; Jonathan Herzig; Gal Elidan; Amir Globerson

arXiv:2406.03618·cs.CL·October 15, 2024

TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools

Avi Caciularu, Alon Jacovi, Eyal Ben-David, Sasha Goldshtein, Tal, Schuster, Jonathan Herzig, Gal Elidan, Amir Globerson

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces TACT, a challenging dataset for evaluating LLMs' complex reasoning and calculation abilities across texts, revealing current models' limitations and proposing a tool-based framework to improve performance.

Contribution

The paper presents TACT, a novel dataset for complex aggregative reasoning, and proposes a tool-based modeling framework to enhance LLMs' reasoning and computational skills.

Findings

01

LLMs perform poorly on TACT with accuracy below 38%

02

Current models struggle with table-generation, command-generation, and execution components

03

Tool-based prompting improves model performance over standard techniques

Abstract

Large Language Models (LLMs) often do not perform well on queries that require the aggregation of information across texts. To better evaluate this setting and facilitate modeling efforts, we introduce TACT - Text And Calculations through Tables, a dataset crafted to evaluate LLMs' reasoning and computational abilities using complex instructions. TACT contains challenging instructions that demand stitching information scattered across one or more texts, and performing complex integration on this information to generate the answer. We construct this dataset by leveraging an existing dataset of texts and their associated tables. For each such tables, we formulate new queries, and gather their respective answers. We demonstrate that all contemporary LLMs perform poorly on this dataset, achieving an accuracy below 38%. To pinpoint the difficulties and thoroughly dissect the problem, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

google/TACT
dataset· 14 dl
14 dl

Videos

TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools· slideslive

Taxonomy

TopicsSemantic Web and Ontologies · Bayesian Modeling and Causal Inference · Logic, Reasoning, and Knowledge