# Smaller Text Classifiers with Discriminative Cluster Embeddings

**Authors:** Mingda Chen, Kevin Gimpel

arXiv: 1906.09532 · 2019-06-25

## TL;DR

This paper introduces a method to significantly reduce text classifier sizes by learning discrete word clusters end-to-end, using Gumbel-Softmax, and selectively adding parameters for improved accuracy.

## Contribution

It presents a novel end-to-end clustering approach with Gumbel-Softmax for smaller, more efficient text classifiers, including parameter-efficient variations.

## Key findings

- Reduced model size with maintained accuracy
- Effective end-to-end clustering of words
- Parameter-efficient improvements

## Abstract

Word embedding parameters often dominate overall model sizes in neural methods for natural language processing. We reduce deployed model sizes of text classifiers by learning a hard word clustering in an end-to-end manner. We use the Gumbel-Softmax distribution to maximize over the latent clustering while minimizing the task loss. We propose variations that selectively assign additional parameters to words, which further improves accuracy while still remaining parameter-efficient.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.09532/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1906.09532/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1906.09532/full.md

---
Source: https://tomesphere.com/paper/1906.09532