Semantic Text Compression for Classification

Emrecan Kutay; Aylin Yener

arXiv:2309.10809·cs.IT·September 20, 2023

Semantic Text Compression for Classification

Emrecan Kutay, Aylin Yener

PDF

Open Access

TL;DR

This paper introduces semantic compression techniques for text using sentence embeddings and semantic distortion metrics, achieving significant resource savings with minimal accuracy loss in classification tasks.

Contribution

It proposes novel semantic quantization and clustering methods for text compression that preserve meaning and improve resource efficiency in classification applications.

Findings

01

Orders of magnitude reduction in bits needed for message representation

02

Minimal accuracy loss compared to semantic-agnostic baselines

03

Effective generalization across diverse benchmark datasets

Abstract

We study semantic compression for text where meanings contained in the text are conveyed to a source decoder, e.g., for classification. The main motivator to move to such an approach of recovering the meaning without requiring exact reconstruction is the potential resource savings, both in storage and in conveying the information to another node. Towards this end, we propose semantic quantization and compression approaches for text where we utilize sentence embeddings and the semantic distortion metric to preserve the meaning. Our results demonstrate that the proposed semantic approaches result in substantial (orders of magnitude) savings in the required number of bits for message representation at the expense of very modest accuracy loss compared to the semantic agnostic baseline. We compare the results of proposed approaches and observe that resource savings enabled by semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Algorithms and Data Compression · Image Retrieval and Classification Techniques