Obstruction reasoning for robotic grasping
Runyu Jiao, Matteo Bortolon, Francesco Giuliari, Alice Fasoli, Sergio Povoli, Guofeng Mei, Yiming Wang, Fabio Poiesi

TL;DR
UNOGrasp is a novel vision-language model that performs multi-step obstruction reasoning to improve robotic grasping in cluttered environments, using a large annotated dataset and combining supervised and reinforcement learning.
Contribution
The paper introduces UNOGrasp, a learning-based model with a new multi-step reasoning process and a large dataset, advancing obstruction reasoning for robotic grasping.
Findings
UNOGrasp outperforms existing models in obstruction reasoning and grasp success.
The model demonstrates significant improvements in both synthetic and real-world tests.
Extensive experiments validate the effectiveness of the proposed approach.
Abstract
Successful robotic grasping in cluttered environments not only requires a model to visually ground a target object but also to reason about obstructions that must be cleared beforehand. While current vision-language embodied reasoning models show emergent spatial understanding, they remain limited in terms of obstruction reasoning and accessibility planning. To bridge this gap, we present UNOGrasp, a learning-based vision-language model capable of performing visually-grounded obstruction reasoning to infer the sequence of actions needed to unobstruct the path and grasp the target object. We devise a novel multi-step reasoning process based on obstruction paths originated by the target object. We anchor each reasoning step with obstruction-aware visual cues to incentivize reasoning capability. UNOGrasp combines supervised and reinforcement finetuning through verifiable reasoning rewards.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Social Robot Interaction and HRI
