PerLA: Perceptive 3D Language Assistant
Guofeng Mei, Wei Lin, Luigi Riz, Yujiao Wu, Fabio Poiesi, and Yiming Wang

TL;DR
PerLA is a novel 3D language assistant that effectively captures local details and global context from point clouds, enhancing LLM understanding of 3D scenes with improved accuracy in question answering and dense captioning.
Contribution
Introduces PerLA, a perceptive 3D language assistant that preserves local details and global context using a novel algorithm with Hilbert curve, cross-attention, and a new loss function.
Findings
Outperforms state-of-the-art in 3D question answering and captioning tasks.
Achieves up to +1.34 CiDEr on ScanQA.
Improves dense captioning scores on ScanRefer and Nr3D.
Abstract
Enabling Large Language Models (LLMs) to understand the 3D physical world is an emerging yet challenging research direction. Current strategies for processing point clouds typically downsample the scene or divide it into smaller parts for separate analysis. However, both approaches risk losing key local details or global contextual information. In this paper, we introduce PerLA, a 3D language assistant designed to be more perceptive to both details and context, making visual representations more informative for the LLM. PerLA captures high-resolution (local) details in parallel from different point cloud areas and integrates them with (global) context obtained from a lower-resolution whole point cloud. We present a novel algorithm that preserves point cloud locality through the Hilbert curve and effectively aggregates local-to-global information via cross-attention and a graph neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems · Multimodal Machine Learning Applications · Hand Gesture Recognition Systems
