ReorientDiff: Diffusion Model based Reorientation for Object Manipulation
Utkarsh A. Mishra, Yongxin Chen

TL;DR
ReorientDiff is a diffusion model-based approach that plans intermediate object reorientation poses using visual and language inputs, achieving high success in simulation for robotic manipulation tasks.
Contribution
This paper introduces ReorientDiff, a novel diffusion model-based method for object reorientation that integrates visual and language cues for improved planning.
Findings
Achieved 95.2% success rate in simulation with YCB objects.
Effectively conditions on scene and goal language prompts.
Demonstrates potential for generalizable object manipulation.
Abstract
The ability to manipulate objects in a desired configurations is a fundamental requirement for robots to complete various practical applications. While certain goals can be achieved by picking and placing the objects of interest directly, object reorientation is needed for precise placement in most of the tasks. In such scenarios, the object must be reoriented and re-positioned into intermediate poses that facilitate accurate placement at the target pose. To this end, we propose a reorientation planning method, ReorientDiff, that utilizes a diffusion model-based approach. The proposed method employs both visual inputs from the scene, and goal-specific language prompts to plan intermediate reorientation poses. Specifically, the scene and language-task information are mapped into a joint scene-task representation feature space, which is subsequently leveraged to condition the diffusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Multimodal Machine Learning Applications
MethodsDiffusion
