Robust Linear Classification from Limited Training Data
Deepayan Chakrabarti

TL;DR
The paper introduces RoLin, a novel algorithm for linear classification with limited data that outperforms traditional methods like dimensionality reduction and regularization, especially with small sample sizes.
Contribution
RoLin is a new algorithm that combines principal component analysis with robust optimization, requiring no user-defined parameters and improving classification performance in limited-data scenarios.
Findings
RoLin outperforms dimensionality reduction with 14-40% lower test loss.
RoLin is up to 3x better than L1 regularization for logistic loss.
RoLin achieves better results with fewer samples, sometimes outperforming regularization with 100x more data.
Abstract
We consider the problem of linear classification under general loss functions in the limited-data setting. Overfitting is a common problem here. The standard approaches to prevent overfitting are dimensionality reduction and regularization. But dimensionality reduction loses information, while regularization requires the user to choose a norm, or a prior, or a distance metric. We propose an algorithm called RoLin that needs no user choice and applies to a large class of loss functions. RoLin combines reliable information from the top principal components with a robust optimization to extract any useful information from unreliable subspaces. It also includes a new robust cross-validation that is better than existing cross-validation methods in the limited-data setting. Experiments on real-world datasets and three standard loss functions show that RoLin broadly outperforms both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
