Optimal Data Reduction under Information-Theoretic Criteria
Taotao He, Jun Luo, Junkai Zhao

TL;DR
This paper presents a novel optimization-based approach for data reduction through feature and instance selection, achieving globally optimal solutions by formulating the problems as mixed integer linear programs and demonstrating superior performance over existing methods.
Contribution
It introduces polyhedral relaxations and exact MILP formulations for optimal data reduction under information-theoretic criteria, addressing nonconvexities and improving solution quality.
Findings
Efficiently solves data reduction problems to global optimality.
Outperforms existing benchmark approaches in numerical experiments.
Applicable to real-world and synthetic datasets.
Abstract
Selecting an optimal subset of features or instances under an information theoretic criterion has become an effective preprocessing strategy for reducing data complexity while preserving essential information. This study investigates two representative problems within this paradigm: feature selection based on the maximum relevance minimum redundancy criterion, and instance selection grounded in the Kullback Leibler divergence. To address the intrinsic nonconvexities of these problems, we develop polyhedral relaxations that yield exact mixed integer linear programming formulations, thereby enabling globally optimal data reduction. By leveraging modern optimization techniques, we further design efficient algorithmic implementations capable of solving practically sized instances. Extensive numerical experiments on both real world and synthetic datasets demonstrate that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
