A Modern Take on Visual Relationship Reasoning for Grasp Planning
Paolo Rabino, Tatiana Tommasi

TL;DR
This paper introduces a transformer-based model for visual relationship reasoning in robotic grasp planning, handling complex scenes with multiple objects, and establishes a new state-of-the-art benchmark with a novel dataset and evaluation metric.
Contribution
It presents D3G, an end-to-end transformer model for detecting objects and their spatial relationships, and introduces D3GD, a new dataset for cluttered bin picking scenes.
Findings
Achieved state-of-the-art performance on relationship detection
Introduced the Average Precision of Relationships metric
Provided a new dataset with diverse cluttered scenes
Abstract
Interacting with real-world cluttered scenes pose several challenges to robotic agents that need to understand complex spatial dependencies among the observed objects to determine optimal pick sequences or efficient object retrieval strategies. Existing solutions typically manage simplified scenarios and focus on predicting pairwise object relationships following an initial object detection phase, but often overlook the global context or struggle with handling redundant and missing object relations. In this work, we present a modern take on visual relational reasoning for grasp planning. We introduce D3GD, a novel testbed that includes bin picking scenes with up to 35 objects from 97 distinct categories. Additionally, we propose D3G, a new end-to-end transformer-based dependency graph generation model that simultaneously detects objects and produces an adjacency matrix representing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Semantic Web and Ontologies · Robotic Path Planning Algorithms
MethodsFocus
