StrucText-Eval: Evaluating Large Language Model's Reasoning Ability in   Structure-Rich Text

Zhouhong Gu; Haoning Ye; Xingzhou Chen; Zeyang Zhou; Hongwei Feng,; Yanghua Xiao

arXiv:2406.10621·cs.CL·October 22, 2024

StrucText-Eval: Evaluating Large Language Model's Reasoning Ability in Structure-Rich Text

Zhouhong Gu, Haoning Ye, Xingzhou Chen, Zeyang Zhou, Hongwei Feng,, Yanghua Xiao

PDF

Open Access 1 Repo

TL;DR

This paper introduces StrucText-Eval, a benchmark for evaluating large language models' reasoning abilities on structure-rich text, revealing current limitations in understanding complex structured data.

Contribution

It presents a novel automatic data generation method and a comprehensive benchmark supporting multiple languages and tasks to assess LLM reasoning on structured data.

Findings

01

Open-source LLMs achieve up to 74.9% accuracy on standard tasks.

02

Performance drops to 45.8% on harder tasks.

03

Humans reach 92.6% accuracy on complex structured data.

Abstract

The effective utilization of structured data, integral to corporate data strategies, has been challenged by the rise of large language models (LLMs) capable of processing unstructured information. This shift prompts the question: can LLMs interpret structured data directly in its unstructured form? We propose an automatic evaluation data generation method for assessing LLMs' reasoning capabilities on structure-rich text to explore this. Our approach supports 8 structured languages and 29 tasks, generating data with adjustable complexity through controllable nesting and structural width. We introduce StrucText-Eval, a benchmark containing 5,800 pre-generated and annotated samples designed to evaluate how well LLMs understand and reason through structured text. StrucText-Eval is divided into two suites: a regular Test suite (3,712 samples) and a Test-Hard suite (2,088 samples), the latter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mikegu721/structext-eval
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling