How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?
Tuan Anh Tran, Duy M. H. Nguyen, Hoai-Chau Tran, Michael Barz, Khoa D. Doan, Roger Wattenhofer, Ngo Anh Vien, Mathias Niepert, Daniel Sonntag, Paul Swoboda

TL;DR
This paper reveals that 3D point cloud transformer models are highly redundant in their token representations, and introduces a method to significantly reduce tokens while maintaining performance, improving efficiency in 3D vision tasks.
Contribution
It is the first to analyze token redundancy in large-scale 3D transformers and proposes gitmerge3D for efficient token merging, enhancing scalability.
Findings
Tokens are highly redundant, with up to 95% reduction possible.
gitmerge3D maintains performance while reducing token count significantly.
The method improves computational efficiency across multiple 3D vision tasks.
Abstract
Recent advances in 3D point cloud transformers have led to state-of-the-art results in tasks such as semantic segmentation and reconstruction. However, these models typically rely on dense token representations, incurring high computational and memory costs during training and inference. In this work, we present the finding that tokens are remarkably redundant, leading to substantial inefficiency. We introduce gitmerge3D, a globally informed graph token merging method that can reduce the token count by up to 90-95% while maintaining competitive performance. This finding challenges the prevailing assumption that more tokens inherently yield better performance and highlights that many current models are over-tokenized and under-optimized for scalability. We validate our method across multiple 3D vision tasks and show consistent improvements in computational efficiency. This work is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
Topics3D Shape Modeling and Analysis · Machine Learning in Materials Science · 3D Surveying and Cultural Heritage
