Mixed-Integer Linear Optimization for Cardinality-Constrained Random   Forests

Jan Pablo Burgard; Maria Eduarda Pinheiro; Martin Schmidt

arXiv:2405.09832·math.OC·January 24, 2025·Optim. Lett.

Mixed-Integer Linear Optimization for Cardinality-Constrained Random Forests

Jan Pablo Burgard, Maria Eduarda Pinheiro, Martin Schmidt

PDF

Open Access 2 Repos

TL;DR

This paper introduces a mixed-integer linear optimization model for semi-supervised random forests, improving classification accuracy and correlation metrics in biased, limited-label scenarios.

Contribution

It develops a novel optimization-based approach for semi-supervised random forests, incorporating labeled and unlabeled data and class size information.

Findings

01

Improved accuracy over traditional random forests in biased samples

02

Better Matthews correlation coefficient with limited labeled data

03

Effective preprocessing and branching techniques for large problems

Abstract

Random forests are among the most famous algorithms for solving classification problems, in particular for large-scale data sets. Considering a set of labeled points and several decision trees, the method takes the majority vote to classify a new given point. In some scenarios, however, labels are only accessible for a proper subset of the given points. Moreover, this subset can be non-representative, e.g., due to collection bias. Semi-supervised learning considers the setting of labeled and unlabeled data and often improves the reliability of the results. In addition, it can be possible to obtain additional information about class sizes from undisclosed sources. We propose a mixed-integer linear optimization model for computing a semi-supervised random forest that covers the setting of labeled and unlabeled data points as well as the overall number of points in each class for a binary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Bayesian Modeling and Causal Inference