Scene-agnostic Hierarchical Bimanual Task Planning via Visual Affordance Reasoning

Kwang Bin Lee; Jiho Kang; Sung-Hee Lee

arXiv:2512.09310·cs.RO·December 11, 2025

Scene-agnostic Hierarchical Bimanual Task Planning via Visual Affordance Reasoning

Kwang Bin Lee, Jiho Kang, Sung-Hee Lee

PDF

Open Access

TL;DR

This paper introduces a unified framework enabling embodied agents to plan and execute coordinated two-handed actions in unseen cluttered environments by reasoning about scene affordances and spatial relationships.

Contribution

It presents a novel scene-agnostic bimanual task planning system integrating visual grounding, subgoal reasoning, and structured prompting for coordinated manipulation.

Findings

01

Produces coherent, feasible two-handed plans in cluttered scenes

02

Generalizes to unseen environments without retraining

03

Demonstrates robust scene-agnostic affordance reasoning

Abstract

Embodied agents operating in open environments must translate high-level instructions into grounded, executable behaviors, often requiring coordinated use of both hands. While recent foundation models offer strong semantic reasoning, existing robotic task planners remain predominantly unimanual and fail to address the spatial, geometric, and coordination challenges inherent to bimanual manipulation in scene-agnostic settings. We present a unified framework for scene-agnostic bimanual task planning that bridges high-level reasoning with 3D-grounded two-handed execution. Our approach integrates three key modules. Visual Point Grounding (VPG) analyzes a single scene image to detect relevant objects and generate world-aligned interaction points. Bimanual Subgoal Planner (BSP) reasons over spatial adjacency and cross-object accessibility to produce compact, motion-neutralized subgoals that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Robotic Path Planning Algorithms