CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects

Jingliang Li; Jindou Jia; Tuo An; Chuhao Zhou; Xiangyu Chen; Shilin Shan; Boyu Ma; Bofan Lyu; Gen Li; Jianfei Yang

arXiv:2604.02060·cs.CV·May 19, 2026

CompassAD: Intent-Driven 3D Affordance Grounding in Functionally Competing Objects

Jingliang Li, Jindou Jia, Tuo An, Chuhao Zhou, Xiangyu Chen, Shilin Shan, Boyu Ma, Bofan Lyu, Gen Li, Jianfei Yang

PDF

TL;DR

This paper introduces a new benchmark and framework for intent-driven 3D affordance grounding in scenes with multiple similar objects, enabling robots to identify the correct object based on natural language intent.

Contribution

It formalizes the challenging problem of confusable affordance grounding, creates the CompassAD benchmark, and proposes CompassNet with novel modules for improved accuracy.

Findings

01

State-of-the-art performance on the CompassAD benchmark.

02

Effective transfer of the method to real-world robotic grasping.

03

Proposed modules prevent semantic leakage and improve discrimination.

Abstract

When told to "cut the cake," a robot must choose the knife over nearby scissors, despite both objects affording the same cutting function. In real-world scenes, multiple objects may share identical affordances, yet only one is appropriate under the given task context. We call such cases confusing pairs. However, existing 3D affordance methods largely sidestep this challenge by evaluating isolated single objects, often with explicit category names provided in the query. We formalize Intent-Driven Confusable Affordance Grounding, a new 3D affordance setting that requires predicting a per-point affordance mask on the correct object within a multi-object point cloud, conditioned on implicit natural language intent. To study this problem, we construct CompassAD, the first benchmark centered on implicit intent in confusing multi-object compositions. It comprises 30 confusing object pairs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.