SeGPruner: Semantic-Geometric Visual Token Pruner for 3D Question Answering

Wenli Li; Kai Zhao; Haoran Jiang; Enquan Yang; Yi Su; Dan Zeng

arXiv:2603.29437·cs.CV·April 1, 2026

SeGPruner: Semantic-Geometric Visual Token Pruner for 3D Question Answering

Wenli Li, Kai Zhao, Haoran Jiang, Enquan Yang, Yi Su, Dan Zeng

PDF

1 Repo

TL;DR

SeGPruner is a novel semantic-geometric token pruning method that significantly reduces visual tokens in 3D question answering models, boosting efficiency while preserving reasoning accuracy.

Contribution

It introduces a dual-module framework combining semantic saliency and geometric diversity for effective token reduction in 3D QA.

Findings

01

Reduces visual token set by 91%

02

Lowers inference latency by 86%

03

Maintains competitive 3D reasoning performance

Abstract

Vision-language models (VLMs) have been widely adopted for 3D question answering (3D QA). In typical pipelines, visual tokens extracted from multiple viewpoints are concatenated with language tokens and jointly processed by a large language model (LLM) for inference. However, aggregating multi-view observations inevitably introduces severe token redundancy, leading to an overly large visual token set that significantly hinders inference efficiency under constrained token budgets. Visual token pruning has emerged as a prevalent strategy to address this issue. Nevertheless, most existing pruners are primarily tailored to 2D inputs or rely on indirect geometric cues, which limits their ability to explicitly retain semantically critical objects and maintain sufficient spatial coverage for robust 3D reasoning. In this paper, we propose SeGPruner, a semantic-aware and geometry-guided token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

intcomp/SegPruner
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.