PLAF: Pixel-wise Language-Aligned Feature Extraction for Efficient 3D Scene Understanding

Junjie Wen; Junlin He; Fei Ma; Jinqiang Cui

arXiv:2604.15770·cs.CV·April 24, 2026

PLAF: Pixel-wise Language-Aligned Feature Extraction for Efficient 3D Scene Understanding

Junjie Wen, Junlin He, Fei Ma, Jinqiang Cui

PDF

1 Repo

Abstract

Accurate open-vocabulary 3D scene understanding requires semantic representations that are both language-aligned and spatially precise at the pixel level, while remaining scalable when lifted to 3D space. However, existing representations struggle to jointly satisfy these requirements, and densely propagating pixel-wise semantics to 3D often results in substantial redundancy, leading to inefficient storage and querying in large-scale scenes. To address these challenges, we present \emph{PLAF}, a Pixel-wise Language-Aligned Feature extraction framework that enables dense and accurate semantic alignment in 2D without sacrificing open-vocabulary expressiveness. Building upon this representation, we further design an efficient semantic storage and querying scheme that significantly reduces redundancy across both 2D and 3D domains. Experimental results show that \emph{PLAF} provides a strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rockwenjj/PLAF
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.