A3R: Agentic Affordance Reasoning via Cross-Dimensional Evidence in 3D Gaussian Scenes

Di Li; Jie Feng; Guanbin Li; Ronghua Shang; Yuhui Zheng; Weisheng Dong; Guangming Shi

arXiv:2604.01882·cs.CV·April 3, 2026

A3R: Agentic Affordance Reasoning via Cross-Dimensional Evidence in 3D Gaussian Scenes

Di Li, Jie Feng, Guanbin Li, Ronghua Shang, Yuhui Zheng, Weisheng Dong, Guangming Shi

PDF

TL;DR

A3R introduces an iterative, evidence-driven approach for affordance reasoning in 3D scenes, improving accuracy over static methods by actively acquiring geometric and semantic evidence.

Contribution

The paper presents A3R, a novel framework that combines cross-dimensional evidence acquisition with a GRPO-based policy for improved affordance reasoning.

Findings

01

A3R outperforms static baselines on scene-level benchmarks.

02

Iterative evidence acquisition reduces ambiguity and improves reasoning accuracy.

03

Cross-dimensional evidence integration enhances fine-grained affordance understanding.

Abstract

Affordance reasoning in 3D Gaussian scenes aims to identify the region that supports the action specified by a given text instruction in complex environments. Existing methods typically cast this problem as one-shot prediction from static scene observations, assuming sufficient evidence is already available for reasoning. However, in complex 3D scenes, many failure cases arise not from weak prediction capacity, but from incomplete task-relevant evidence under fixed observations. To address this limitation, we reformulate fine-grained affordance reasoning as a sequential evidence acquisition process, where ambiguity is progressively reduced through complementary 3D geometric and 2D semantic evidence. Building on this formulation, we propose A3R, an agentic affordance reasoning framework that enables an MLLM-based policy to iteratively select evidence acquisition actions and update the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.