Lightweight Conceptual Dictionary Learning for Text Classification Using Information Compression
Li Wan, Tansu Alpcan, Margreta Kuijper, Emanuele Viterbo

TL;DR
This paper introduces a lightweight, information-theoretic dictionary learning method for text classification that uses data compression to create discriminative features, achieving competitive performance with fewer parameters.
Contribution
The paper presents a novel two-phase dictionary learning framework based on LZW compression and mutual information optimization for effective text classification.
Findings
Achieves near state-of-the-art accuracy on benchmark datasets.
Uses only 10% of parameters compared to top models.
Performs well in limited-vocabulary scenarios but less so with diverse vocabularies.
Abstract
We propose a novel, lightweight supervised dictionary learning framework for text classification based on data compression and representation. This two-phase algorithm initially employs the Lempel-Ziv-Welch (LZW) algorithm to construct a dictionary from text datasets, focusing on the conceptual significance of dictionary elements. Subsequently, dictionaries are refined considering label data, optimizing dictionary atoms to enhance discriminative power based on mutual information and class distribution. This process generates discriminative numerical representations, facilitating the training of simple classifiers such as SVMs and neural networks. We evaluate our algorithm's information-theoretic performance using information bottleneck principles and introduce the information plane area rank (IPAR) as a novel metric to quantify the information-theoretic performance. Tested on six…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Computational Techniques and Applications · Educational Technology and Assessment
