On Geodesic Distances and Contextual Embedding Compression for Text Classification
Rishi Jha, Kai Mihata

TL;DR
This paper explores a geometric method combining Isomap and PCA to compress BERT embeddings, maintaining high classification accuracy with significantly reduced dimensions, especially effective for syntactic tasks.
Contribution
It introduces a novel post-processing technique using geodesic distance estimation for effective embedding compression in text classification.
Findings
Compressed embeddings within 0.1% of original BERT performance
Achieved 12-fold dimensionality reduction
Method excels on syntax-dependent tasks
Abstract
In some memory-constrained settings like IoT devices and over-the-network data pipelines, it can be advantageous to have smaller contextual embeddings. We investigate the efficacy of projecting contextual embedding data (BERT) onto a manifold, and using nonlinear dimensionality reduction techniques to compress these embeddings. In particular, we propose a novel post-processing approach, applying a combination of Isomap and PCA. We find that the geodesic distance estimations, estimates of the shortest path on a Riemannian manifold, from Isomap's k-Nearest Neighbors graph bolstered the performance of the compressed embeddings to be comparable to the original BERT embeddings. On one dataset, we find that despite a 12-fold dimensionality reduction, the compressed embeddings performed within 0.1% of the original BERT embeddings on a downstream classification task. In addition, we find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Advanced Graph Neural Networks · Face and Expression Recognition
MethodsAttention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Dense Connections · Softmax · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Layer Normalization
