CRISP: Object Pose and Shape Estimation with Test-Time Adaptation
Jingnan Shi, Rajat Talak, Harry Zhang, David Jin, and Luca Carlone

TL;DR
CRISP is a novel, category-agnostic pipeline for object pose and shape estimation from RGB-D images that incorporates test-time adaptation and self-supervised domain adaptation to improve accuracy across diverse datasets and unseen objects.
Contribution
Introduces CRISP, a category-agnostic object pose and shape estimation pipeline with a shape correction method and a self-training domain adaptation approach.
Findings
High performance on YCBV, SPE3R, and NOCS datasets.
Effective bridging of large domain gaps through self-training.
Ability to generalize to unseen objects.
Abstract
We consider the problem of estimating object pose and shape from an RGB-D image. Our first contribution is to introduce CRISP, a category-agnostic object pose and shape estimation pipeline. The pipeline implements an encoder-decoder model for shape estimation. It uses FiLM-conditioning for implicit shape reconstruction and a DPT-based network for estimating pose-normalized points for pose estimation. As a second contribution, we propose an optimization-based pose and shape corrector that can correct estimation errors caused by a domain gap. Observing that the shape decoder is well behaved in the convex hull of known shapes, we approximate the shape decoder with an active shape model, and show that this reduces the shape correction problem to a constrained linear least squares problem, which can be solved efficiently by an interior point algorithm. Third, we introduce a self-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robot Manipulation and Learning · Human Pose and Action Recognition
