KeyPointDiffuser: Unsupervised 3D Keypoint Learning via Latent Diffusion Models
Rhys Newbury, Juyan Zhang, Tin Tran, Hanna Kurniawati, Dana Kuli\'c

TL;DR
This paper introduces KeyPointDiffuser, an unsupervised method that learns structured 3D keypoints from point clouds, enabling shape reconstruction with improved consistency and interpretability, suitable for modern 3D generative models.
Contribution
It proposes a novel unsupervised framework that learns spatially structured 3D keypoints to condition diffusion models for shape reconstruction, bridging a gap in existing methods.
Findings
Achieves a 6 percentage-point improvement in keypoint consistency.
Supports smooth interpolation in keypoint space.
Demonstrates strong performance across diverse object categories.
Abstract
Understanding and representing the structure of 3D objects in an unsupervised manner remains a core challenge in computer vision and graphics. Most existing unsupervised keypoint methods are not designed for unconditional generative settings, restricting their use in modern 3D generative pipelines; our formulation explicitly bridges this gap. We present an unsupervised framework for learning spatially structured 3D keypoints from point cloud data. These keypoints serve as a compact and interpretable representation that conditions an Elucidated Diffusion Model (EDM) to reconstruct the full shape. The learned keypoints exhibit repeatable spatial structure across object instances and support smooth interpolation in keypoint space, indicating that they capture geometric variation. Our method achieves strong performance across diverse object categories, yielding a 6 percentage-point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis
