Robustness is Important: Limitations of LLMs for Data Fitting
Hejia Liu, Mochen Yang, Gediminas Adomavicius

TL;DR
This paper reveals that large language models are highly sensitive to irrelevant data representation changes, such as variable renaming, which significantly impacts their prediction accuracy, raising concerns about their robustness for data fitting tasks.
Contribution
The study uncovers the vulnerability of LLMs to task-irrelevant variations and analyzes attention patterns, highlighting limitations even in specialized models like TabPFN.
Findings
Prediction error can vary by up to 82% due to irrelevant data changes.
Both in-context learning and fine-tuning exhibit sensitivity to data representation.
State-of-the-art tabular models like TabPFN are also affected by irrelevant variations.
Abstract
Large Language Models (LLMs) are being applied in a wide array of settings, well beyond the typical language-oriented use cases. In particular, LLMs are increasingly used as a plug-and-play method for fitting data and generating predictions. Prior work has shown that LLMs, via in-context learning or supervised fine-tuning, can perform competitively with many tabular supervised learning techniques in terms of predictive performance. However, we identify a critical vulnerability of using LLMs for data fitting -- making changes to data representation that are completely irrelevant to the underlying learning task can drastically alter LLMs' predictions on the same data. For example, simply changing variable names can sway the size of prediction error by as much as 82% in certain settings. Such prediction sensitivity with respect to task-irrelevant variations manifests under both in-context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Text Readability and Simplification
