Fast Nearest-Neighbor Classification using RNN in Domains with Large   Number of Classes

Gautam Singh; Gargi Dasgupta; Yu Deng

arXiv:1712.03941·cs.IR·December 12, 2017

Fast Nearest-Neighbor Classification using RNN in Domains with Large Number of Classes

Gautam Singh, Gargi Dasgupta, Yu Deng

PDF

Open Access

TL;DR

This paper introduces a hybrid cascaded approach combining a fast RNN and a slow nearest-neighbor method to efficiently classify text into thousands of classes with limited training data, improving speed and accuracy.

Contribution

The paper proposes a novel cascaded system that significantly reduces query time while enhancing accuracy in large-scale text classification tasks with few training samples per class.

Findings

01

Query time reduced to one-sixth of the original.

02

Outperforms LSH-based baseline in speed.

03

Provides a lower bound on cascaded model accuracy.

Abstract

In scenarios involving text classification where the number of classes is large (in multiples of 10000s) and training samples for each class are few and often verbose, nearest neighbor methods are effective but very slow in computing a similarity score with training samples of every class. On the other hand, machine learning models are fast at runtime but training them adequately is not feasible using few available training samples per class. In this paper, we propose a hybrid approach that cascades 1) a fast but less-accurate recurrent neural network (RNN) model and 2) a slow but more-accurate nearest-neighbor model using bag of syntactic features. Using the cascaded approach, our experiments, performed on data set from IT support services where customer complaint text needs to be classified to return top- $N$ possible error codes, show that the query-time of the slow system is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Text and Document Classification Technologies · Machine Learning and Data Classification