Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations
Dimitris Bertsimas, Caio de Prospero Iglesias, Nicholas A. G. Johnson

TL;DR
This paper introduces a novel sparse multiple kernel learning method that explicitly constrains kernel selection, solves the non-convex problem efficiently, and outperforms existing methods in prediction accuracy with certified near-optimal solutions.
Contribution
It formulates SMKL with explicit cardinality constraints, develops an alternating best response algorithm, and provides semidefinite relaxations for solution certification and warm-starting.
Findings
Outperforms state-of-the-art MKL methods in accuracy by 3.34% on average.
Achieves better accuracy by 4.05% with warm starting.
Provides certificates of near-optimality for solutions.
Abstract
We study Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevailing l1 regularized approaches that approximate a sparsifying penalty, we formulate the problem by imposing an explicit cardinality constraint on the kernel weights and add an l2 penalty for robustness. We solve the resulting non-convex minimax problem via an alternating best response algorithm with two subproblems: the alpha subproblem is a standard kernel SVM dual solved via LIBSVM, while the beta subproblem admits an efficient solution via the Greedy Selector and Simplex Projector algorithm. We reformulate SMKL as a mixed integer semidefinite optimization problem and derive a hierarchy of semidefinite convex relaxations which can be used to certify near-optimality of the solutions returned by our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Face and Expression Recognition
