Self-supervised Learning of 3D Object Understanding by Data Association and Landmark Estimation for Image Sequence
Hyeonwoo Yu, Jean Oh

TL;DR
This paper introduces a self-supervised approach for 3D multi-object pose estimation from image sequences, leveraging data association and landmark estimation to improve accuracy without extensive 3D annotations.
Contribution
It proposes a novel self-supervised learning strategy that uses multiple observations and data association to surpass self-performance limits in 3D object pose estimation.
Findings
Improved 3D pose estimation accuracy on KITTI dataset
Effective use of image sequences for self-supervised learning
Enhanced network performance through iterative fine-tuning
Abstract
In this paper, we propose a self-supervised learningmethod for multi-object pose estimation. 3D object under-standing from 2D image is a challenging task that infers ad-ditional dimension from reduced-dimensional information.In particular, the estimation of the 3D localization or orien-tation of an object requires precise reasoning, unlike othersimple clustering tasks such as object classification. There-fore, the scale of the training dataset becomes more cru-cial. However, it is challenging to obtain large amount of3D dataset since achieving 3D annotation is expensive andtime-consuming. If the scale of the training dataset can beincreased by involving the image sequence obtained fromsimple navigation, it is possible to overcome the scale lim-itation of the dataset and to have efficient adaptation tothe new environment. However, when the self annotation isconducted on single image by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
