3D Implicit Transporter for Temporally Consistent Keypoint Discovery
Chengliang Zhong, Yuhang Zheng, Yupeng Zheng, Hao Zhao, Li Yi,, Xiaodong Mu, Ling Wang, Pengfei Li, Guyue Zhou, Chao Yang, Xinliang Zhang,, Jian Zhao

TL;DR
This paper introduces the first 3D implicit Transporter model that ensures spatio-temporal consistency in keypoint detection for 3D objects, improving 3D manipulation tasks.
Contribution
It develops a novel 3D Transporter leveraging hybrid representations, cross attention, and implicit reconstruction, addressing the limitations of 2D methods for 3D data.
Findings
Learned keypoints are spatio-temporally consistent.
The method outperforms existing approaches in 3D object manipulation.
Codes are publicly available for reproducibility.
Abstract
Keypoint-based representation has proven advantageous in various visual and robotic tasks. However, the existing 2D and 3D methods for detecting keypoints mainly rely on geometric consistency to achieve spatial alignment, neglecting temporal consistency. To address this issue, the Transporter method was introduced for 2D data, which reconstructs the target frame from the source frame to incorporate both spatial and temporal information. However, the direct application of the Transporter to 3D point clouds is infeasible due to their structural differences from 2D images. Thus, we propose the first 3D version of the Transporter, which leverages hybrid 3D representation, cross attention, and implicit reconstruction. We apply this new learning system on 3D articulated objects and nonrigid animals (humans and rodents) and show that learned keypoints are spatio-temporally consistent.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
3D Implicit Transporter for Temporally Consistent Keypoint Discovery· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
