DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations
Dongwon Son, Sanghyeon Son, Jaehyung Kim, Beomjoon Kim

TL;DR
DEF-oriCORN introduces a novel scene representation and diffusion-based state estimation enabling robust, language-guided manipulation in complex environments without demonstrations, generalizing well to real-world scenarios.
Contribution
It proposes a new object-based scene representation and diffusion-model-based state estimation for efficient, demonstration-free language-directed manipulation planning.
Findings
Outperforms state-of-the-art baselines in estimation and planning
Generalizes zero-shot to real-world scenarios with diverse objects
Handles transparent and reflective objects effectively
Abstract
We present DEF-oriCORN, a framework for language-directed manipulation tasks. By leveraging a novel object-based scene representation and diffusion-model-based state estimation algorithm, our framework enables efficient and robust manipulation planning in response to verbal commands, even in tightly packed environments with sparse camera views without any demonstrations. Unlike traditional representations, our representation affords efficient collision checking and language grounding. Compared to state-of-the-art baselines, our framework achieves superior estimation and motion planning performance from sparse RGB images and zero-shot generalizes to real-world scenarios with diverse materials, including transparent and reflective objects, despite being trained exclusively in simulation. Our code for data generation, training, inference, and pre-trained weights are publicly available at:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Robot Manipulation and Learning
