Voxel Densification for Serialized 3D Object Detection: Mitigating Sparsity via Pre-serialization Expansion

Qifeng Liu; Dawei Zhao; Yabo Dong; Linzhi Shang; Liang Xiao; Juan Wang; Kunkong Zhao; Dongming Lu; Qi Zhu

arXiv:2508.16069·cs.CV·February 26, 2026

Voxel Densification for Serialized 3D Object Detection: Mitigating Sparsity via Pre-serialization Expansion

Qifeng Liu, Dawei Zhao, Yabo Dong, Linzhi Shang, Liang Xiao, Juan Wang, Kunkong Zhao, Dongming Lu, Qi Zhu

PDF

TL;DR

This paper introduces Voxel Densification Module (VDM), a novel approach to expand voxel density in 3D object detection, significantly improving accuracy for sparse objects by propagating semantics to empty voxels before serialization.

Contribution

The paper proposes VDM, a new module that densifies voxels using sparse 3D convolutions and residual blocks, enhancing detection performance in serialized 3D detection frameworks.

Findings

01

Achieves 74.8 mAPH on Waymo validation set

02

Improves detection accuracy across multiple benchmarks

03

Demonstrates consistent performance gains over baseline models

Abstract

Recent advances in point cloud object detection have increasingly adopted Transformer-based and State Space Models (SSMs) to capture long-range dependencies. However, these serialized frameworks strictly maintain the consistency of input and output voxel dimensions, inherently lacking the capability for voxel expansion. This limitation hinders performance, as expanding the voxel set is known to significantly enhance detection accuracy, particularly for sparse foreground objects. To bridge this gap, we propose a novel Voxel Densification Module (VDM). Unlike standard convolutional stems, VDM is explicitly designed to promote pre-serialization spatial expansion. It leverages sparse 3D convolutions to propagate foreground semantics to neighboring empty voxels, effectively densifying the feature representation before it is flattened into a sequence. Simultaneously, VDM incorporates residual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.