Column generation based math-heuristic for classification trees
Murat Firat, Guillaume Crognier, Adriana F. Gabor, C.A.J. Hurkens, and, Yingqian Zhang

TL;DR
This paper introduces a novel column generation-based heuristic for constructing univariate binary decision trees, capable of handling large datasets efficiently and competitively with existing methods like CART.
Contribution
It presents a new ILP formulation for decision trees and a column generation heuristic that improves scalability and performance on large datasets.
Findings
The approach is competitive with state-of-the-art ILP algorithms.
It can handle datasets with tens of thousands of data points.
The method produces solutions comparable to CART for large datasets.
Abstract
This paper explores the use of Column Generation (CG) techniques in constructing univariate binary decision trees for classification tasks. We propose a novel Integer Linear Programming (ILP) formulation, based on root-to-leaf paths in decision trees. The model is solved via a Column Generation based heuristic. To speed up the heuristic, we use a restricted instance data by considering a subset of decision splits, sampled from the solutions of the well-known CART algorithm. Extensive numerical experiments show that our approach is competitive with the state-of-the-art ILP-based algorithms. In particular, the proposed approach is capable of handling big data sets with tens of thousands of data rows. Moreover, for large data sets, it finds solutions competitive to CART.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Metaheuristic Optimization Algorithms Research
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
