Embedding Compression for Text Classification Using Dictionary Screening

Jing Zhou; Xinru Jing; Muyu Liu; Hansheng Wang

arXiv:2211.12715·cs.CL·November 24, 2022

Embedding Compression for Text Classification Using Dictionary Screening

Jing Zhou, Xinru Jing, Muyu Liu, Hansheng Wang

PDF

Open Access

TL;DR

This paper introduces a dictionary screening technique for embedding compression in text classification, significantly reducing model size while maintaining high prediction accuracy.

Contribution

It presents a novel importance assessment method for keywords, enabling effective dictionary screening and embedding compression in text classification models.

Findings

01

Significant reduction in dictionary size and text sequence length.

02

Maintains competitive prediction accuracy after compression.

03

Demonstrated effectiveness through extensive numerical studies.

Abstract

In this paper, we propose a dictionary screening method for embedding compression in text classification tasks. The key purpose of this method is to evaluate the importance of each keyword in the dictionary. To this end, we first train a pre-specified recurrent neural network-based model using a full dictionary. This leads to a benchmark model, which we then use to obtain the predicted class probabilities for each sample in a dataset. Next, to evaluate the impact of each keyword in affecting the predicted class probabilities, we develop a novel method for assessing the importance of each keyword in a dictionary. Consequently, each keyword can be screened, and only the most important keywords are reserved. With these screened keywords, a new dictionary with a considerably reduced size can be constructed. Accordingly, the original text sequence can be substantially compressed. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Speech Recognition and Synthesis · Algorithms and Data Compression