Greedy Convex Ensemble
Tan Nguyen, Nan Ye, Peter L. Bartlett

TL;DR
This paper investigates a greedy approach to learning convex combinations of basis models, providing theoretical insights on capacity and generalization, and demonstrating empirical effectiveness comparable to boosting and random forests.
Contribution
It introduces a theoretical analysis of linear versus convex combinations and proposes a greedy algorithm for convex ensemble learning with empirical validation.
Findings
Greedy convex ensembles perform competitively with boosting and random forests.
Convex hulls have bounded capacity, reducing overfitting risk compared to linear hulls.
The greedy algorithm adapts well to problem complexity and requires minimal hyper-parameter tuning.
Abstract
We consider learning a convex combination of basis models, and present some new theoretical and empirical results that demonstrate the effectiveness of a greedy approach. Theoretically, we first consider whether we can use linear, instead of convex, combinations, and obtain generalization results similar to existing ones for learning from a convex hull. We obtain a negative result that even the linear hull of very simple basis functions can have unbounded capacity, and is thus prone to overfitting; on the other hand, convex hulls are still rich but have bounded capacities. Secondly, we obtain a generalization bound for a general class of Lipschitz loss functions. Empirically, we first discuss how a convex combination can be greedily learned with early stopping, and how a convex combination can be non-greedily learned when the number of basis models is known a priori. Our experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Face and Expression Recognition · Stochastic Gradient Optimization Techniques
