Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs
Ananya Singha, Jos\'e Cambronero, Sumit Gulwani, Vu Le, Chris Parnin

TL;DR
This paper investigates how different prompt formats and real-world noise operations affect large language models' ability to understand and process tabular data, revealing the impact of data messiness on model performance.
Contribution
It introduces 8 noise operations inspired by real-world data issues and evaluates their effects on LLMs across various prompt formats for tabular understanding tasks.
Findings
Noise operations significantly impact LLM performance
Prompt format influences task accuracy
Real-world data messiness affects model robustness
Abstract
Large language models (LLMs) are increasingly applied for tabular tasks using in-context learning. The prompt representation for a table may play a role in the LLMs ability to process the table. Inspired by prior work, we generate a collection of self-supervised structural tasks (e.g. navigate to a cell and row; transpose the table) and evaluate the performance differences when using 8 formats. In contrast to past work, we introduce 8 noise operations inspired by real-world messy data and adversarial inputs, and show that such operations can impact LLM performance across formats for different structural understanding tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
