Spatial Reasoning via Deep Vision Models for Robotic Sequential Manipulation
Hongyou Zhou, Ingmar Schubert, Marc Toussaint, Ozgur S. Oguz

TL;DR
This paper introduces a deep learning-based heuristic for robotic manipulation that predicts relevant objects in a scene, significantly reducing the search space in task and motion planning and improving efficiency.
Contribution
It presents a novel integration of vision transformer and ResNet models as heuristics within TAMP to handle long-horizon tasks more efficiently.
Findings
More efficient solution search compared to state-of-the-art TAMP.
Effective prediction of relevant objects for manipulation tasks.
Reduced computational complexity in planning.
Abstract
In this paper, we propose using deep neural architectures (i.e., vision transformers and ResNet) as heuristics for sequential decision-making in robotic manipulation problems. This formulation enables predicting the subset of objects that are relevant for completing a task. Such problems are often addressed by task and motion planning (TAMP) formulations combining symbolic reasoning and continuous motion planning. In essence, the action-object relationships are resolved for discrete, symbolic decisions that are used to solve manipulation motions (e.g., via nonlinear trajectory optimization). However, solving long-horizon tasks requires consideration of all possible action-object combinations which limits the scalability of TAMP approaches. To overcome this combinatorial complexity, we introduce a visual perception module integrated with a TAMP-solver. Given a task and an initial image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Multimodal Machine Learning Applications · Robot Manipulation and Learning
