Class-Partitioned VQ-VAE and Latent Flow Matching for Point Cloud Scene Generation

Dasith de Silva Edirimuni; Ajmal Saeed Mian

arXiv:2601.12391·cs.CV·January 21, 2026

Class-Partitioned VQ-VAE and Latent Flow Matching for Point Cloud Scene Generation

Dasith de Silva Edirimuni, Ajmal Saeed Mian

PDF

Open Access

TL;DR

This paper introduces a class-partitioned VQ-VAE and latent flow matching approach for generating complex 3D point cloud scenes directly, improving scene plausibility and accuracy without external object databases.

Contribution

The paper proposes a novel class-partitioned VQ-VAE with class-aware codebook updates and a latent flow matching model for direct point cloud scene generation.

Findings

01

Achieves up to 70.4% reduction in Chamfer error on complex scenes.

02

Effectively decodes class-specific point clouds without external databases.

03

Reliable scene recovery demonstrated on complex living room scenes.

Abstract

Most 3D scene generation methods are limited to only generating object bounding box parameters while newer diffusion methods also generate class labels and latent features. Using object size or latent feature, they then retrieve objects from a predefined database. For complex scenes of varied, multi-categorical objects, diffusion-based latents cannot be effectively decoded by current autoencoders into the correct point cloud objects which agree with target classes. We introduce a Class-Partitioned Vector Quantized Variational Autoencoder (CPVQ-VAE) that is trained to effectively decode object latent features, by employing a pioneering $class-partitioned codebook$ where codevectors are labeled by class. To address the problem of $codebook collapse$ , we propose a $class-aware$ running average update which reinitializes dead codevectors within each partition.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis