SAGA: Open-World Mobile Manipulation via Structured Affordance Grounding

Kuan Fang; Yuxin Chen; Xinghao Zhu; Farzad Niroui; Lingfeng Sun; Jiuguang Wang

arXiv:2512.12842·cs.RO·December 16, 2025

SAGA: Open-World Mobile Manipulation via Structured Affordance Grounding

Kuan Fang, Yuxin Chen, Xinghao Zhu, Farzad Niroui, Lingfeng Sun, Jiuguang Wang

PDF

Open Access

TL;DR

SAGA introduces a structured affordance grounding framework for mobile manipulation that generalizes across environments and task specifications, enabling versatile visuomotor control with improved performance.

Contribution

The paper presents a novel affordance-based task representation grounded in multimodal foundation models, facilitating generalist mobile manipulation with zero-shot and few-shot capabilities.

Findings

01

Outperforms end-to-end and modular baselines significantly

02

Successfully generalizes across 11 real-world tasks

03

Enables zero-shot and few-shot task execution

Abstract

We present SAGA, a versatile and adaptive framework for visuomotor control that can generalize across various environments, task objectives, and user specifications. To efficiently learn such capability, our key idea is to disentangle high-level semantic intent from low-level visuomotor control by explicitly grounding task objectives in the observed environment. Using an affordance-based task representation, we express diverse and complex behaviors in a unified, structured form. By leveraging multimodal foundation models, SAGA grounds the proposed task representation to the robot's visual observation as 3D affordance heatmaps, highlighting task-relevant entities while abstracting away spurious appearance variations that would hinder generalization. These grounded affordances enable us to effectively train a conditional policy on multi-task demonstration data for whole-body control. In a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Motor Control and Adaptation