TABLET: Learning From Instructions For Tabular Data
Dylan Slack, Sameer Singh

TL;DR
This paper introduces TABLET, a benchmark with diverse tabular datasets and instructions, to evaluate how effectively large language models leverage instructions for tabular prediction, revealing both improvements and limitations.
Contribution
The paper presents TABLET, a comprehensive benchmark for assessing instruction-based learning in tabular data, and analyzes LLM performance and instruction faithfulness in this context.
Findings
In-context instructions boost zero-shot F1 scores significantly.
LLMs often ignore instructions and fail on specific instances.
Instructions improve but do not fully solve tabular prediction challenges.
Abstract
Acquiring high-quality data is often a significant challenge in training machine learning (ML) models for tabular prediction, particularly in privacy-sensitive and costly domains like medicine and finance. Providing natural language instructions to large language models (LLMs) offers an alternative solution. However, it is unclear how effectively instructions leverage the knowledge in LLMs for solving tabular prediction problems. To address this gap, we introduce TABLET, a benchmark of 20 diverse tabular datasets annotated with instructions that vary in their phrasing, granularity, and technicality. Additionally, TABLET includes the instructions' logic and structured modifications to the instructions. We find in-context instructions increase zero-shot F1 performance for Flan-T5 11b by 44% on average and 13% for ChatGPT on TABLET. Also, we explore the limitations of using LLMs for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Adversarial Robustness in Machine Learning
Methodsfail · Flan-T5
