Efficient3D: A Unified Framework for Adaptive and Debiased Token Reduction in 3D MLLMs

Yuhui Lin; Siyue Yu; Yuxing Yang; Guangliang Cheng; Jimin Xiao

arXiv:2604.02689·cs.CV·April 6, 2026

Efficient3D: A Unified Framework for Adaptive and Debiased Token Reduction in 3D MLLMs

Yuhui Lin, Siyue Yu, Yuxing Yang, Guangliang Cheng, Jimin Xiao

PDF

1 Repo

TL;DR

Efficient3D introduces a unified framework with token pruning strategies for 3D multimodal models, significantly reducing inference costs while maintaining high accuracy across multiple benchmarks.

Contribution

The paper proposes a novel Debiased Visual Token Importance Estimator and Adaptive Token Rebalancing strategies for efficient 3D MLLMs.

Findings

01

Achieves +2.57% CIDEr improvement on Scan2Cap.

02

Reduces inference overhead while maintaining accuracy.

03

Demonstrates effectiveness across five 3D vision-language benchmarks.

Abstract

Recent advances in Multimodal Large Language Models (MLLMs) have expanded reasoning capabilities into 3D domains, enabling fine-grained spatial understanding. However, the substantial size of 3D MLLMs and the high dimensionality of input features introduce considerable inference overhead, which limits practical deployment on resource constrained platforms. To overcome this limitation, this paper presents Efficient3D, a unified framework for visual token pruning that accelerates 3D MLLMs while maintaining competitive accuracy. The proposed framework introduces a Debiased Visual Token Importance Estimator (DVTIE) module, which considers the influence of shallow initial layers during attention aggregation, thereby producing more reliable importance predictions for visual tokens. In addition, an Adaptive Token Rebalancing (ATR) strategy is developed to dynamically adjust pruning strength…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sol924/Efficient3D
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.