Active Learning for Decision Trees with Provable Guarantees

Arshia Soltani Moakhar; Tanapoom Laoaron; Faraz Ghahremani; Kiarash Banihashem; MohammadTaghi Hajiaghayi

arXiv:2601.20775·cs.LG·February 20, 2026

Active Learning for Decision Trees with Provable Guarantees

Arshia Soltani Moakhar, Tanapoom Laoaron, Faraz Ghahremani, Kiarash Banihashem, MohammadTaghi Hajiaghayi

PDF

Open Access 3 Reviews

TL;DR

This paper provides a theoretical analysis of active learning for decision trees, introducing a new algorithm with provable guarantees that achieves near-optimal label complexity under specific assumptions.

Contribution

It offers the first analysis of the disagreement coefficient for decision trees and presents a new active learning algorithm with multiplicative error guarantees.

Findings

01

Disagreement coefficient analysis for decision trees under certain assumptions.

02

A new active learning algorithm achieving polylogarithmic label complexity.

03

Lower bounds showing near-optimal dependence on error tolerance.

Abstract

This paper advances the theoretical understanding of active learning label complexity for decision trees as binary classifiers. We make two main contributions. First, we provide the first analysis of the disagreement coefficient for decision trees-a key parameter governing active learning label complexity. Our analysis holds under two natural assumptions required for achieving polylogarithmic label complexity, (i) each root-to-leaf path queries distinct feature dimensions, and (ii) the input data has a regular, grid-like structure. We show these assumptions are essential, as relaxing them leads to polynomial label complexity. Second, we present the first general active learning algorithm for binary classification that achieves a multiplicative error guarantee, producing a $(1 + ϵ)$ -approximate classifier. By combining these results, we design an active learning algorithm for…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 4

Strengths

- The problem addressed is inherently interesting: active learning for non-linear, structured models like decision trees is a challenging and important direction. - The choice to use multiplicative (rather than additive) bounds is conceptually appealing, as it aligns with realizable-case analyses and could, in principle, yield sharper guarantees.

Weaknesses

- **Limited theoretical novelty and impact** The theoretical contributions are incremental. The derived bounds on the disagreement coefficient $\theta = O((\ln n)^d)$ are relatively weak and pessimistic: $(\ln n)^d$ can grow very quickly even for moderately large depth, making the guarantee practically meaningless in realistic settings. Moreover, no *lower bounds* are provided, leaving it unclear whether the obtained rates are at all tight or informative. - **Strong and unrealistic assumption

Reviewer 02Rating 8Confidence 3

Strengths

- The results are interesting and make progress on important problems in learning theory. - The techniques are interesting and non-trivial. - Overall I found the paper well-written.

Weaknesses

- The bounds are a bit unsatisfactory. - The paper might be a bit hard to follow for some members of the ICLR community who are less on the theory side.

Reviewer 03Rating 4Confidence 3

Strengths

Decision tree is an important and broadly studied class and the theory of active learning for decision trees has not been well understood yet. This paper makes progress in this direction by considering the disagreement coefficient framework proposed by Hanneke. For structured decision trees and structured distributions (uniform distribution over grid point), this paper gives an upper bound on the disagreement coefficient, which gives an upper bound on the label complexity of the problem. Further

Weaknesses

Although technically solid, I am not sure if the contribution is significant enough. 1. For the disagreement coefficient of the decision trees, this paper places structural assumptions on both the type of decision trees and the datasets. This looks a bit too strong. In particular, if the marginal distribution is structured, then the disagreement coefficient does not characterize the min-max label complexity for the problem (for example, a halfspace has disagreement coefficient $\sqrt{d}$ under

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)