NeuroVoxel-LM: Language-Aligned 3D Perception via Dynamic Voxelization and Meta-Embedding

Shiyu Liu; Lianlei Shan

arXiv:2507.20110·cs.CV·July 29, 2025

NeuroVoxel-LM: Language-Aligned 3D Perception via Dynamic Voxelization and Meta-Embedding

Shiyu Liu, Lianlei Shan

PDF

TL;DR

NeuroVoxel-LM introduces a novel framework combining dynamic voxelization and meta-embedding to enhance language-aligned 3D perception from large-scale point clouds, improving efficiency and semantic accuracy.

Contribution

The paper presents NeuroVoxel-LM, integrating adaptive voxelization and lightweight meta-embedding to address limitations in existing 3D language models.

Findings

01

DR-MSV improves feature extraction efficiency and accuracy

02

TAP-LME enhances semantic representation over max-pooling

03

Framework outperforms existing methods in 3D perception tasks

Abstract

Recent breakthroughs in Visual Language Models (VLMs) and Multimodal Large Language Models (MLLMs) have significantly advanced 3D scene perception towards language-driven cognition. However, existing 3D language models struggle with sparse, large-scale point clouds due to slow feature extraction and limited representation accuracy. To address these challenges, we propose NeuroVoxel-LM, a novel framework that integrates Neural Radiance Fields (NeRF) with dynamic resolution voxelization and lightweight meta-embedding. Specifically, we introduce a Dynamic Resolution Multiscale Voxelization (DR-MSV) technique that adaptively adjusts voxel granularity based on geometric and structural complexity, reducing computational cost while preserving reconstruction fidelity. In addition, we propose the Token-level Adaptive Pooling for Lightweight Meta-Embedding (TAP-LME) mechanism, which enhances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.