Compression Hacking: A Supplementary Perspective on Informatics Properties of Language Models from Geometric Distortion

Jianxiang Zang; Meiling Ning; Yongda Wei; Shihan Dou; Jiazheng Zhang; Nijia Mo; Binhong Li; Tao Gui; Qi Zhang; Xuanjing Huang

arXiv:2505.17793·cs.CL·November 7, 2025

Compression Hacking: A Supplementary Perspective on Informatics Properties of Language Models from Geometric Distortion

Jianxiang Zang, Meiling Ning, Yongda Wei, Shihan Dou, Jiazheng Zhang, Nijia Mo, Binhong Li, Tao Gui, Qi Zhang, Xuanjing Huang

PDF

Open Access

TL;DR

This paper introduces geometric distortion-based metrics to better understand language model representations, revealing that traditional compression metrics can be misleading due to anisotropic distortions.

Contribution

The authors propose three refined compression metrics incorporating geometric distortion analysis, improving the correlation with language model capabilities.

Findings

01

Refined metrics achieve Spearman correlation > 0.9 with LM capabilities

02

Geometric distortion analysis reveals anisotropy in compressed representations

03

Enhanced metrics outperform original compression and structure-based metrics

Abstract

Recently, the concept of ``compression as intelligence'' has provided a novel informatics metric perspective for language models (LMs), emphasizing that highly structured representations signify the intelligence level of LMs. However, from a geometric standpoint, the word representation space of highly compressed LMs tends to degenerate into a highly anisotropic state, which hinders the LM's ability to comprehend instructions and directly impacts its performance. We found this compression-anisotropy synchronicity is essentially the ``Compression Hacking'' in LM representations, where noise-dominated directions tend to create the illusion of high compression rates by sacrificing spatial uniformity. Based on this, we propose three refined compression metrics by incorporating geometric distortion analysis and integrate them into a self-evaluation pipeline. The refined metrics exhibit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputability, Logic, AI Algorithms · Natural Language Processing Techniques