Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search
Jingdong Zhang, Yizhou Wang, Zhengzhong Tu, Xin Li, Wenping Wang, Xiaohang Zhan

TL;DR
This paper introduces a novel framework for humanoid visual search in 360° environments that decouples exploration into a probabilistic semantic predictor and an actor, improving efficiency and reducing annotation costs.
Contribution
It proposes Imagining in 360°, a new approach that models semantic spatial priors with a dedicated Imaginator, lowering data costs and enhancing search performance.
Findings
Significantly improves search efficiency in complex environments.
Reduces data annotation costs by over 1.96 million samples.
Outperforms prior methods in success rates.
Abstract
Humanoid Visual Search (HVS) requires agents to actively explore immersive 360 environments. While prior methods treat this as a monolithic task relying on cumulative, multi-turn Chain-of-Thought (CoT) reasoning, they impose heavy cognitive burdens and require expensive trajectory-level annotations. In this paper, we propose Imagining in 360, a novel framework that decouples the exploration process into a specialized Imaginator and an Actor. The Imaginator functions as a probabilistic predictor of spatial priors; instead of maintaining a cumulative reasoning chain, it infers the semantic layout of both observed and unobserved regions in a single step. By sampling multiple hypotheses within this semantic space, we provide the Actor with a distribution of effective spatial information, offering robust guidance that hedges against uncertainty during active search. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
