Compressibility of Distributed Document Representations

Bla\v{z} \v{S}krlj; Matej Petkovi\v{c}

arXiv:2110.07595·cs.CL·October 15, 2021

Compressibility of Distributed Document Representations

Bla\v{z} \v{S}krlj, Matej Petkovi\v{c}

PDF

TL;DR

This paper introduces CoRe, a simple recursive compression method for document representations that reduces size and noise, potentially improving NLP task performance and lowering deployment costs.

Contribution

The paper presents CoRe, a universal, efficient framework for compressing document representations, demonstrating its effectiveness across diverse datasets and compression algorithms.

Findings

01

Recursive SVD provides a strong balance between compression and performance.

02

CoRe improves text classification accuracy with compressed representations.

03

Significant reduction in representation size achieved without performance loss.

Abstract

Contemporary natural language processing (NLP) revolves around learning from latent document representations, generated either implicitly by neural language models or explicitly by methods such as doc2vec or similar. One of the key properties of the obtained representations is their dimension. Whilst the commonly adopted dimensions of 256 and 768 offer sufficient performance on many tasks, it is many times unclear whether the default dimension is the most suitable choice for the subsequent downstream learning tasks. Furthermore, representation dimensions are seldom subject to hyperparameter tuning due to computational constraints. The purpose of this paper is to demonstrate that a surprisingly simple and efficient recursive compression procedure can be sufficient to both significantly compress the initial representation, but also potentially improve its performance when considering the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.