Boosting with the Logistic Loss is Consistent
Matus Telgarsky

TL;DR
This paper establishes optimization guarantees, generalization bounds, and statistical consistency for AdaBoost variants that use logistic and similar convex losses, showing their theoretical robustness and convergence properties.
Contribution
It provides the first comprehensive theoretical analysis of AdaBoost with logistic loss, including optimization, generalization, and consistency results under various data separability conditions.
Findings
AdaBoost with logistic loss converges quickly in separable cases.
The convex surrogate risk exhibits distribution-dependent curvature in nonseparable cases.
The algorithm's output maintains small norm with high probability in nonseparable scenarios.
Abstract
This manuscript provides optimization guarantees, generalization bounds, and statistical consistency results for AdaBoost variants which replace the exponential loss with the logistic and similar losses (specifically, twice differentiable convex losses which are Lipschitz and tend to zero on one side). The heart of the analysis is to show that, in lieu of explicit regularization and constraints, the structure of the problem is fairly rigidly controlled by the source distribution itself. The first control of this type is in the separable case, where a distribution-dependent relaxed weak learning rate induces speedy convergence with high probability over any sample. Otherwise, in the nonseparable case, the convex surrogate risk itself exhibits distribution-dependent levels of curvature, and consequently the algorithm's output has small norm with high probability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
