DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

Ruining Li; Chuanxia Zheng; Christian Rupprecht; Andrea Vedaldi

arXiv:2403.15382·cs.CV·July 30, 2024·1 cites

DragAPart: Learning a Part-Level Motion Prior for Articulated Objects

Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi

PDF

Open Access 1 Datasets

TL;DR

DragAPart is a novel method that learns part-level motion priors for articulated objects, enabling realistic and category-generalized motion generation from images and user interactions.

Contribution

It introduces DragAPart, a framework that predicts part-level interactions using a pre-trained image generator fine-tuned on a synthetic dataset, enhancing motion understanding across categories.

Findings

01

Outperforms prior motion-controlled generators in part-level understanding

02

Generalizes well to real images and multiple object categories

03

Uses a new encoding for drags and dataset randomization

Abstract

We introduce DragAPart, a method that, given an image and a set of drags as input, generates a new image of the same object that responds to the action of the drags. Differently from prior works that focused on repositioning objects, DragAPart predicts part-level interactions, such as opening and closing a drawer. We study this problem as a proxy for learning a generalist motion model, not restricted to a specific kinematic structure or object category. We start from a pre-trained image generator and fine-tune it on a new synthetic dataset, Drag-a-Move, which we introduce. Combined with a new encoding for the drags and dataset randomization, the model generalizes well to real images and different categories. Compared to prior motion-controlled generators, we demonstrate much better part-level motion understanding.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

rayli/Drag-a-Move-test-split
dataset· 94 dl
94 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Robot Manipulation and Learning

MethodsSparse Evolutionary Training