Witty: An Efficient Solver for Computing Minimum-Size Decision Trees

Luca Pascal Staus; Christian Komusiewicz; Frank Sommer; Manuel Sorge

arXiv:2412.11954·cs.DS·December 17, 2024

Witty: An Efficient Solver for Computing Minimum-Size Decision Trees

Luca Pascal Staus, Christian Komusiewicz, Frank Sommer, Manuel Sorge

PDF

Open Access

TL;DR

This paper empirically evaluates an efficient algorithmic paradigm for minimum-size decision trees, demonstrating significant speedups over existing methods and providing improved theoretical bounds for the problem.

Contribution

It implements and enhances the witness trees paradigm for MSDT, achieving substantial empirical speedups and offering improved worst-case theoretical bounds.

Findings

01

Achieved a mean 324-fold speedup over naive implementation

02

Outperformed state-of-the-art solvers with a mean 32-fold speedup

03

Provided an improved worst-case running-time bound for MSDT

Abstract

Decision trees are a classic model for summarizing and classifying data. To enhance interpretability and generalization properties, it has been proposed to favor small decision trees. Accordingly, in the minimum-size decision tree training problem (MSDT), the input is a set of training examples in $R^{d}$ with class labels and we aim to find a decision tree that classifies all training examples correctly and has a minimum number of nodes. MSDT is NP-hard and therefore presumably not solvable in polynomial time. Nevertheless, Komusiewicz et al. [ICML '23] developed a promising algorithmic paradigm called witness trees which solves MSDT efficiently if the solution tree is small. In this work, we test this paradigm empirically. We provide an implementation, augment it with extensive heuristic improvements, and scrutinize it on standard benchmark instances. The augmentations achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Data Mining Algorithms and Applications · Data Quality and Management