HCC-3D: Hierarchical Compensatory Compression for 98% 3D Token Reduction in Vision-Language Models

Liheng Zhang; Jin Wang; Hui Li; Bingfeng Zhang; Weifeng Liu

arXiv:2511.09883·cs.CV·November 14, 2025

HCC-3D: Hierarchical Compensatory Compression for 98% 3D Token Reduction in Vision-Language Models

Liheng Zhang, Jin Wang, Hui Li, Bingfeng Zhang, Weifeng Liu

PDF

Open Access

TL;DR

This paper introduces HCC-3D, a hierarchical compression method that reduces 3D token processing by about 98% in vision-language models, significantly improving efficiency while maintaining high performance.

Contribution

HCC-3D is a novel hierarchical compression framework that effectively reduces 3D tokens in vision-language models with minimal information loss.

Findings

01

Achieves approximately 98% token reduction.

02

Outperforms previous methods in efficiency and accuracy.

03

Maintains critical structural and detail information.

Abstract

3D understanding has drawn significant attention recently, leveraging Vision-Language Models (VLMs) to enable multi-modal reasoning between point cloud and text data. Current 3D-VLMs directly embed the 3D point clouds into 3D tokens, following large 2D-VLMs with powerful reasoning capabilities. However, this framework has a great computational cost limiting its application, where we identify that the bottleneck lies in processing all 3D tokens in the Large Language Model (LLM) part. This raises the question: how can we reduce the computational overhead introduced by 3D tokens while preserving the integrity of their essential information? To address this question, we introduce Hierarchical Compensatory Compression (HCC-3D) to efficiently compress 3D tokens while maintaining critical detail retention. Specifically, we first propose a global structure compression (GSC), in which we design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Multimodal Machine Learning Applications · Advanced Neural Network Applications