Copy-Trasform-Paste: Zero-Shot Object-Object Alignment Guided by Vision-Language and Geometric Constraints

Rotem Gatenyo; Ohad Fried

arXiv:2601.14207·cs.GR·March 3, 2026

Copy-Trasform-Paste: Zero-Shot Object-Object Alignment Guided by Vision-Language and Geometric Constraints

Rotem Gatenyo, Ohad Fried

PDF

Open Access

TL;DR

This paper introduces a zero-shot method for aligning 3D meshes based on language prompts, using differentiable rendering and geometric constraints, eliminating the need for training new models.

Contribution

It presents a novel optimization-based framework that combines vision-language cues with geometric constraints for 3D alignment without training, outperforming existing methods.

Findings

01

Outperforms baseline methods in alignment accuracy

02

Produces semantically faithful and physically plausible results

03

Curated a diverse benchmark for evaluation

Abstract

We study zero-shot 3D alignment of two given meshes, using a text prompt describing their spatial relation -- an essential capability for content creation and scene assembly. Earlier approaches primarily rely on geometric alignment procedures, while recent work leverages pretrained 2D diffusion models to model language-conditioned object-object spatial relationships. In contrast, we directly optimize the relative pose at test time, updating translation, rotation, and isotropic scale with CLIP-driven gradients via a differentiable renderer, without training a new model. Our framework augments language supervision with geometry-aware objectives: a variant of soft-Iterative Closest Point (ICP) term to encourage surface attachment and a penetration loss to discourage interpenetration. A phased schedule strengthens contact constraints over time, and camera control concentrates the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Interactive and Immersive Displays · Robot Manipulation and Learning