Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search

Jingdong Zhang; Yizhou Wang; Zhengzhong Tu; Xin Li; Wenping Wang; Xiaohang Zhan

arXiv:2605.09146·cs.CV·May 12, 2026

Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search

Jingdong Zhang, Yizhou Wang, Zhengzhong Tu, Xin Li, Wenping Wang, Xiaohang Zhan

PDF

TL;DR

This paper introduces a novel framework for humanoid visual search in 360° environments that decouples exploration into a probabilistic semantic predictor and an actor, improving efficiency and reducing annotation costs.

Contribution

It proposes Imagining in 360°, a new approach that models semantic spatial priors with a dedicated Imaginator, lowering data costs and enhancing search performance.

Findings

01

Significantly improves search efficiency in complex environments.

02

Reduces data annotation costs by over 1.96 million samples.

03

Outperforms prior methods in success rates.

Abstract

Humanoid Visual Search (HVS) requires agents to actively explore immersive 360 $^{\circ}$ environments. While prior methods treat this as a monolithic task relying on cumulative, multi-turn Chain-of-Thought (CoT) reasoning, they impose heavy cognitive burdens and require expensive trajectory-level annotations. In this paper, we propose Imagining in 360 $^{\circ}$ , a novel framework that decouples the exploration process into a specialized Imaginator and an Actor. The Imaginator functions as a probabilistic predictor of spatial priors; instead of maintaining a cumulative reasoning chain, it infers the semantic layout of both observed and unobserved regions in a single step. By sampling multiple hypotheses within this semantic space, we provide the Actor with a distribution of effective spatial information, offering robust guidance that hedges against uncertainty during active search. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.