Orthogonal Matching Pursuit for Text Classification

Konstantinos Skianis; Nikolaos Tziortziotis; Michalis Vazirgiannis

arXiv:1807.04715·cs.LG·October 10, 2018

Orthogonal Matching Pursuit for Text Classification

Konstantinos Skianis, Nikolaos Tziortziotis, Michalis Vazirgiannis

PDF

1 Repo

TL;DR

This paper explores the use of Orthogonal Matching Pursuit (OMP) and its overlapping variant for text classification, demonstrating their ability to produce sparse, accurate models in high-dimensional settings.

Contribution

It introduces the application of OMP and overlapping GOMP algorithms to text classification, highlighting their effectiveness and sparsity advantages over traditional regularizers.

Findings

01

OMP and overlapping GOMP produce highly sparse models.

02

Both methods outperform traditional regularizers in accuracy.

03

Algorithms are validated through empirical analysis.

Abstract

In text classification, the problem of overfitting arises due to the high dimensionality, making regularization essential. Although classic regularizers provide sparsity, they fail to return highly accurate models. On the contrary, state-of-the-art group-lasso regularizers provide better results at the expense of low sparsity. In this paper, we apply a greedy variable selection algorithm, called Orthogonal Matching Pursuit, for the text classification task. We also extend standard group OMP by introducing overlapping Group OMP to handle overlapping groups of features. Empirical analysis verifies that both OMP and overlapping GOMP constitute powerful regularizers, able to produce effective and very sparse models. Code and data are available online: https://github.com/y3nk0/OMP-for-Text-Classification .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

y3nk0/OMP-for-Text-Classification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.