Diffusion Masked Pretraining for Dynamic Point Cloud

Zhuoyue Zhang; Jihua Zhu; Chaowei Fang; Jian Liu; Ajmal Saeed Mian

arXiv:2605.03639·cs.CV·May 12, 2026

Diffusion Masked Pretraining for Dynamic Point Cloud

Zhuoyue Zhang, Jihua Zhu, Chaowei Fang, Jian Liu, Ajmal Saeed Mian

PDF

1 Repo

TL;DR

DiMP introduces a diffusion-based self-supervised pretraining framework for dynamic point clouds, addressing positional leakage and multimodal motion uncertainty to improve downstream tasks.

Contribution

It applies diffusion modeling to both positional inference and motion learning, removing positional leakage and modeling multimodal motion distributions.

Findings

01

DiMP achieves 11.21% improvement on offline action segmentation.

02

DiMP improves online inference accuracy by 13.65%.

03

The method outperforms backbone models across multiple benchmarks.

Abstract

Dynamic point cloud pretraining is still dominated by masked reconstruction objectives. However, these objectives inherit two key limitations. Existing methods inject ground-truth tube centers as decoder positional embeddings, causing spatio-temporal positional leakage. Moreover, they supervise inter-frame motion with deterministic proxy targets that systematically discard distributional structure by collapsing multimodal trajectory uncertainty into conditional means. To address these limitations, we propose Diffusion Masked Pretraining (DiMP), a unified self-supervised framework for dynamic point clouds. DiMP introduces diffusion modeling into both positional inference and motion learning. It first applies forward diffusion noise only to masked tube centers, then predicts clean centers from visible spatio-temporal context. This removes positional leakage while preserving visible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

InitalZ/DiMP.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.