From Sequential Nodes to GPU Batches: Parallel Branch and Bound for Optimal $k$-Sparse GLMs
Jiachang Liu, Andrea Lodi

TL;DR
This paper introduces a GPU-accelerated, modular framework for branch and bound algorithms tackling discrete, nonlinear optimization problems, achieving significant speedups and enabling comprehensive model analysis.
Contribution
It presents a novel CPU-GPU framework that processes multiple BnB nodes in batches, overcoming traditional sequential limitations for discrete optimization problems.
Findings
Achieves one to two orders of magnitude speedups.
Zero optimality gap on challenging instances.
Enables collection of the entire Rashomon set for statistical analysis.
Abstract
GPUs have significantly accelerated first-order methods for large-scale optimization, especially in continuous optimization. However, this success has not transferred cleanly to problems with discrete variables, combinatorial structure, and nonlinear objectives, such as certifying optimal solutions for cardinality-constrained generalized linear models. Major challenges include the sequential processing of heterogeneous nodes in branch and bound (BnB) and frequent data movement between the CPU and GPU. We propose a simple, generic, and modular CPU--GPU framework that processes multiple BnB nodes in batches on GPUs. The framework is built around a small set of GPU-efficient routines and uses padding together with lightweight custom kernels to handle irregular node data structures. Experiments show one to two orders of magnitude speedups and zero optimality gap on challenging instances.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
