Struc-Bench: Are Large Language Models Really Good at Generating Complex   Structured Data?

Xiangru Tang; Yiming Zong; Jason Phang; Yilun Zhao; Wangchunshu Zhou,; Arman Cohan; Mark Gerstein

arXiv:2309.08963·cs.CL·April 8, 2024·5 cites

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

Xiangru Tang, Yiming Zong, Jason Phang, Yilun Zhao, Wangchunshu Zhou,, Arman Cohan, Mark Gerstein

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the ability of Large Language Models to generate complex structured data, introduces a new benchmark and metrics for assessment, and proposes a fine-tuning method that improves performance on structured data tasks.

Contribution

The paper presents Struc-Bench, a comprehensive benchmark for structured data generation by LLMs, and introduces FormatCoT and new metrics to better evaluate and enhance LLM performance in this domain.

Findings

01

Fine-tuning LLaMA-7B with structure-aware methods improves performance.

02

Struc-Bench covers multiple formats like text tables, HTML, LaTeX.

03

New metrics P-Score and H-Score provide more accurate performance evaluation.

Abstract

Despite the remarkable capabilities of Large Language Models (LLMs) like GPT-4, producing complex, structured tabular data remains challenging. Our study assesses LLMs' proficiency in structuring tables and introduces a novel fine-tuning method, cognizant of data structures, to bolster their performance. We unveil Struc-Bench, a comprehensive benchmark featuring prominent LLMs (GPT-NeoX-20B, GPT-3.5, GPT-4, and Vicuna), which spans text tables, HTML, and LaTeX formats. Our proposed FormatCoT aids in crafting format-specific instructions from the intended outputs to populate this benchmark. Addressing the gap in task-centered evaluation, we propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score), to more accurately gauge LLM performance. Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gersteinlab/struc-bench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Softmax · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer