TL;DR
This paper explores the use of Orthogonal Matching Pursuit (OMP) and its overlapping variant for text classification, demonstrating their ability to produce sparse, accurate models in high-dimensional settings.
Contribution
It introduces the application of OMP and overlapping GOMP algorithms to text classification, highlighting their effectiveness and sparsity advantages over traditional regularizers.
Findings
OMP and overlapping GOMP produce highly sparse models.
Both methods outperform traditional regularizers in accuracy.
Algorithms are validated through empirical analysis.
Abstract
In text classification, the problem of overfitting arises due to the high dimensionality, making regularization essential. Although classic regularizers provide sparsity, they fail to return highly accurate models. On the contrary, state-of-the-art group-lasso regularizers provide better results at the expense of low sparsity. In this paper, we apply a greedy variable selection algorithm, called Orthogonal Matching Pursuit, for the text classification task. We also extend standard group OMP by introducing overlapping Group OMP to handle overlapping groups of features. Empirical analysis verifies that both OMP and overlapping GOMP constitute powerful regularizers, able to produce effective and very sparse models. Code and data are available online: https://github.com/y3nk0/OMP-for-Text-Classification .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
