Zero-shot Inexact CAD Model Alignment from a Single Image

Pattaramanee Arsomngern; Sasikarn Khwanmuang; Matthias Nie{\ss}ner; Supasorn Suwajanakorn

arXiv:2507.03292·cs.CV·July 8, 2025

Zero-shot Inexact CAD Model Alignment from a Single Image

Pattaramanee Arsomngern, Sasikarn Khwanmuang, Matthias Nie{\ss}ner, Supasorn Suwajanakorn

PDF

TL;DR

This paper introduces a weakly supervised 3D model alignment method from a single image that generalizes to unseen categories without pose annotations, outperforming existing methods on real-world datasets.

Contribution

It proposes a novel foundation feature space and texture-invariant pose refinement technique for inexact 3D model alignment without pose supervision, enabling generalization to new categories.

Findings

01

Outperforms state-of-the-art weakly supervised methods by +4.3% accuracy.

02

Surpasses supervised ROCA by +2.7% in alignment accuracy.

03

Achieves state-of-the-art results on unseen categories in real-world data.

Abstract

One practical approach to infer 3D scene structure from a single image is to retrieve a closely matching 3D model from a database and align it with the object in the image. Existing methods rely on supervised training with images and pose annotations, which limits them to a narrow set of object categories. To address this, we propose a weakly supervised 9-DoF alignment method for inexact 3D models that requires no pose annotations and generalizes to unseen categories. Our approach derives a novel feature space based on foundation features that ensure multi-view consistency and overcome symmetry ambiguities inherent in foundation features using a self-supervised triplet loss. Additionally, we introduce a texture-invariant pose refinement technique that performs dense alignment in normalized object coordinates, estimated through the enhanced feature space. We conduct extensive evaluations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.