FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning

Zhihan Yang; Jiaqi Wei; Xiang Zhang; Haoyu Dong; Yiwen Wang; Xiaoke Guo; Pengkun Zhang; Yiwei Xu; Chenyu You

arXiv:2601.11311·cs.LG·January 19, 2026

FORESTLLM: Large Language Models Make Random Forest Great on Few-shot Tabular Learning

Zhihan Yang, Jiaqi Wei, Xiang Zhang, Haoyu Dong, Yiwen Wang, Xiaoke Guo, Pengkun Zhang, Yiwei Xu, Chenyu You

PDF

Open Access

TL;DR

FORESTLLM combines decision forests with large language models to improve few-shot tabular learning by training the LLM offline to encode knowledge into interpretable trees, achieving state-of-the-art results.

Contribution

It introduces a novel framework that leverages LLMs during training to create robust, interpretable decision forests for few-shot tabular tasks, without requiring LLM inference at test time.

Findings

01

Achieves state-of-the-art performance on few-shot classification and regression benchmarks.

02

Introduces semantic splitting criterion for more robust tree structures.

03

Develops a one-time in-context inference for leaf node stabilization.

Abstract

Tabular data high-stakes critical decision-making in domains such as finance, healthcare, and scientific discovery. Yet, learning effectively from tabular data in few-shot settings, where labeled examples are scarce, remains a fundamental challenge. Traditional tree-based methods often falter in these regimes due to their reliance on statistical purity metrics, which become unstable and prone to overfitting with limited supervision. At the same time, direct applications of large language models (LLMs) often overlook its inherent structure, leading to suboptimal performance. To overcome these limitations, we propose FORESTLLM, a novel framework that unifies the structural inductive biases of decision forests with the semantic reasoning capabilities of LLMs. Crucially, FORESTLLM leverages the LLM only during training, treating it as an offline model designer that encodes rich, contextual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning