LLM Meeting Decision Trees on Tabular Data
Hangting Ye, Jinmeng Li, He Zhao, Dandan Guo, Yi Chang

TL;DR
This paper introduces DeLTa, a novel method that integrates LLMs with decision trees for tabular data, avoiding serialization and fine-tuning, and achieves state-of-the-art results on various benchmarks.
Contribution
DeLTa leverages LLM reasoning to improve decision tree rules without data serialization or fine-tuning, enhancing tabular data prediction.
Findings
Achieves state-of-the-art performance on multiple tabular benchmarks.
Effectively improves decision tree predictions through LLM-derived rule calibration.
Avoids data serialization and fine-tuning, reducing privacy risks and scalability issues.
Abstract
Tabular data have been playing a vital role in diverse real-world fields, including healthcare, finance, etc. With the recent success of Large Language Models (LLMs), early explorations of extending LLMs to the domain of tabular data have been developed. Most of these LLM-based methods typically first serialize tabular data into natural language descriptions, and then tune LLMs or directly infer on these serialized data. However, these methods suffer from two key inherent issues: (i) data perspective: existing data serialization methods lack universal applicability for structured tabular data, and may pose privacy risks through direct textual exposure, and (ii) model perspective: LLM fine-tuning methods struggle with tabular data, and in-context learning scalability is bottle-necked by input length constraints (suitable for few-shot learning). This work explores a novel direction of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
