PerLA: Perceptive 3D Language Assistant

Guofeng Mei; Wei Lin; Luigi Riz; Yujiao Wu; Fabio Poiesi; and Yiming Wang

arXiv:2411.19774·cs.CV·April 8, 2025

PerLA: Perceptive 3D Language Assistant

Guofeng Mei, Wei Lin, Luigi Riz, Yujiao Wu, Fabio Poiesi, and Yiming Wang

PDF

Open Access 1 Repo

TL;DR

PerLA is a novel 3D language assistant that effectively captures local details and global context from point clouds, enhancing LLM understanding of 3D scenes with improved accuracy in question answering and dense captioning.

Contribution

Introduces PerLA, a perceptive 3D language assistant that preserves local details and global context using a novel algorithm with Hilbert curve, cross-attention, and a new loss function.

Findings

01

Outperforms state-of-the-art in 3D question answering and captioning tasks.

02

Achieves up to +1.34 CiDEr on ScanQA.

03

Improves dense captioning scores on ScanRefer and Nr3D.

Abstract

Enabling Large Language Models (LLMs) to understand the 3D physical world is an emerging yet challenging research direction. Current strategies for processing point clouds typically downsample the scene or divide it into smaller parts for separate analysis. However, both approaches risk losing key local details or global contextual information. In this paper, we introduce PerLA, a 3D language assistant designed to be more perceptive to both details and context, making visual representations more informative for the LLM. PerLA captures high-resolution (local) details in parallel from different point cloud areas and integrates them with (global) context obtained from a lower-resolution whole point cloud. We present a novel algorithm that preserves point cloud locality through the Hilbert curve and effectively aggregates local-to-global information via cross-attention and a graph neural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tyroneli/cua_o3d
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Multimodal Machine Learning Applications · Hand Gesture Recognition Systems