GauTOAO: Gaussian-based Task-Oriented Affordance of Objects
Jiawen Wang, Dingsheng Luo

TL;DR
GauTOAO is a Gaussian-based framework that enables robots to understand task-specific object affordances in real-time using vision-language models, improving manipulation accuracy.
Contribution
The paper introduces a novel zero-shot method combining vision-language models and Gaussian distributions for precise task-oriented object affordance detection.
Findings
Enhanced accuracy in affordance region prediction
Effective generalization across multiple tasks
Improved robot manipulation performance
Abstract
When your robot grasps an object using dexterous hands or grippers, it should understand the Task-Oriented Affordances of the Object(TOAO), as different tasks often require attention to specific parts of the object. To address this challenge, we propose GauTOAO, a Gaussian-based framework for Task-Oriented Affordance of Objects, which leverages vision-language models in a zero-shot manner to predict affordance-relevant regions of an object, given a natural language query. Our approach introduces a new paradigm: "static camera, moving object," allowing the robot to better observe and understand the object in hand during manipulation. GauTOAO addresses the limitations of existing methods, which often lack effective spatial grouping, by extracting a comprehensive 3D object mask using DINO features. This mask is then used to conditionally query gaussians, producing a refined semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Simulation Techniques and Applications · Computer Graphics and Visualization Techniques
