Aggressive Sampling for Multi-class to Binary Reduction with   Applications to Text Classification

Bikash Joshi; Massih-Reza Amini; Ioannis Partalas; Franck Iutzeler,; Yury Maximov

arXiv:1701.06511·stat.ML·September 15, 2021·2 cites

Aggressive Sampling for Multi-class to Binary Reduction with Applications to Text Classification

Bikash Joshi, Massih-Reza Amini, Ioannis Partalas, Franck Iutzeler,, Yury Maximov

PDF

Open Access 1 Repo

TL;DR

This paper introduces a double sampling strategy for large-scale multi-class to binary classification reduction, improving efficiency and performance in text classification tasks with many classes.

Contribution

It proposes a novel double sampling method that maintains consistency while reducing computational costs in large multi-class problems.

Findings

01

Significant reduction in training and prediction time.

02

Lower memory consumption compared to existing methods.

03

Competitive predictive performance on large datasets.

Abstract

We address the problem of multi-class classification in the case where the number of classes is very large. We propose a double sampling strategy on top of a multi-class to binary reduction strategy, which transforms the original multi-class problem into a binary classification problem over pairs of examples. The aim of the sampling strategy is to overcome the curse of long-tailed class distributions exhibited in majority of large-scale multi-class classification problems and to reduce the number of pairs of examples in the expanded data. We show that this strategy does not alter the consistency of the empirical risk minimization principle defined over the double sample reduction. Experiments are carried out on DMOZ and Wikipedia collections with 10,000 to 100,000 classes where we show the efficiency of the proposed approach in terms of training and prediction time, memory consumption,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bikash617/Aggressive-Sampling-for-Multi-class-to-BinaryReduction
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Imbalanced Data Classification Techniques · Domain Adaptation and Few-Shot Learning