Semantic Text Compression for Classification
Emrecan Kutay, Aylin Yener

TL;DR
This paper introduces semantic compression techniques for text using sentence embeddings and semantic distortion metrics, achieving significant resource savings with minimal accuracy loss in classification tasks.
Contribution
It proposes novel semantic quantization and clustering methods for text compression that preserve meaning and improve resource efficiency in classification applications.
Findings
Orders of magnitude reduction in bits needed for message representation
Minimal accuracy loss compared to semantic-agnostic baselines
Effective generalization across diverse benchmark datasets
Abstract
We study semantic compression for text where meanings contained in the text are conveyed to a source decoder, e.g., for classification. The main motivator to move to such an approach of recovering the meaning without requiring exact reconstruction is the potential resource savings, both in storage and in conveying the information to another node. Towards this end, we propose semantic quantization and compression approaches for text where we utilize sentence embeddings and the semantic distortion metric to preserve the meaning. Our results demonstrate that the proposed semantic approaches result in substantial (orders of magnitude) savings in the required number of bits for message representation at the expense of very modest accuracy loss compared to the semantic agnostic baseline. We compare the results of proposed approaches and observe that resource savings enabled by semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Algorithms and Data Compression · Image Retrieval and Classification Techniques
