Ensembling LLM-Induced Decision Trees for Explainable and Robust Error Detection

Mengqi Wang (1); Jianwei Wang (1); Qing Liu (2); Xiwei Xu (2); Zhenchang Xing (2); Liming Zhu (2); and Wenjie Zhang (1) ((1) UNSW Sydney; (2) Data61; CSIRO)

arXiv:2512.07246·cs.CL·December 9, 2025

Ensembling LLM-Induced Decision Trees for Explainable and Robust Error Detection

Mengqi Wang (1), Jianwei Wang (1), Qing Liu (2), Xiwei Xu (2), Zhenchang Xing (2), Liming Zhu (2), and Wenjie Zhang (1) ((1) UNSW Sydney, (2) Data61, CSIRO)

PDF

Open Access

TL;DR

This paper introduces a novel ensemble approach using LLM-induced decision trees for explainable and robust error detection in tabular data, significantly improving accuracy and interpretability over existing methods.

Contribution

It proposes a new framework that induces decision trees from LLMs and ensembles them to enhance explainability and robustness in error detection tasks.

Findings

01

Achieves 16.1% higher F1-score than baseline methods.

02

Provides interpretable decision paths for error detection.

03

Demonstrates improved robustness against prompt sensitivity.

Abstract

Error detection (ED), which aims to identify incorrect or inconsistent cell values in tabular data, is important for ensuring data quality. Recent state-of-the-art ED methods leverage the pre-trained knowledge and semantic capability embedded in large language models (LLMs) to directly label whether a cell is erroneous. However, this LLM-as-a-labeler pipeline (1) relies on the black box, implicit decision process, thus failing to provide explainability for the detection results, and (2) is highly sensitive to prompts, yielding inconsistent outputs due to inherent model stochasticity, therefore lacking robustness. To address these limitations, we propose an LLM-as-an-inducer framework that adopts LLM to induce the decision tree for ED (termed TreeED) and further ensembles multiple such trees for consensus detection (termed ForestED), thereby improving explainability and robustness.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Imbalanced Data Classification Techniques · Cell Image Analysis Techniques