Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation
Hongxing Fan, Shuyu Zhao, Jiayang Ao, Lu Sheng

TL;DR
This paper introduces a collaborative multi-agent framework for amodal completion that improves semantic and structural inference by explicit planning, verification, and diverse hypothesis generation, along with a new human-aligned evaluation metric.
Contribution
It proposes a novel multi-agent reasoning framework with explicit semantic planning, verification, and diverse hypotheses, plus a new evaluation metric for amodal completion.
Findings
Outperforms state-of-the-art methods on multiple datasets.
Achieves more semantically consistent and structurally complete completions.
Introduces the MAC-Score for better human-aligned evaluation.
Abstract
Amodal completion, the task of inferring invisible object parts, faces significant challenges in maintaining semantic consistency and structural integrity. Prior progressive approaches are inherently limited by inference instability and error accumulation. To tackle these limitations, we present a Collaborative Multi-Agent Reasoning Framework that explicitly decouples Semantic Planning from Visual Synthesis. By employing specialized agents for upfront reasoning, our method generates a structured, explicit plan before pixel generation, enabling visually and semantically coherent single-pass synthesis. We integrate this framework with two critical mechanisms: (1) a self-correcting Verification Agent that employs Chain-of-Thought reasoning to rectify visible region segmentation and identify residual occluders strictly within the Semantic Planning phase, and (2) a Diverse Hypothesis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Generative Adversarial Networks and Image Synthesis
