See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent

Tianci Tang; Tielong Cai; Hongwei Wang; Gaoang Wang

arXiv:2602.23806·cs.CV·March 2, 2026

See, Act, Adapt: Active Perception for Unsupervised Cross-Domain Visual Adaptation via Personalized VLM-Guided Agent

Tianci Tang, Tielong Cai, Hongwei Wang, Gaoang Wang

PDF

Open Access

TL;DR

Sea$^2$ introduces an active perception framework that adapts the deployment of frozen perception models via an intelligent agent, improving performance in novel indoor scenes without retraining or scene-specific annotations.

Contribution

It proposes a novel paradigm that uses a pose-control agent to adapt perception model deployment, avoiding retraining and scene-specific annotations in cross-domain visual tasks.

Findings

01

Achieved 13.54% improvement in visual grounding

02

Achieved 15.92% improvement in segmentation

03

Achieved 27.68% improvement in 3D box estimation

Abstract

Pre-trained perception models excel in generic image domains but degrade significantly in novel environments like indoor scenes. The conventional remedy is fine-tuning on downstream data which incurs catastrophic forgetting of prior knowledge and demands costly, scene-specific annotations. We propose a paradigm shift through Sea $^{2}$ (See, Act, Adapt): rather than adapting the perception modules themselves, we adapt how they are deployed through an intelligent pose-control agent. Sea $^{2}$ keeps all perception modules frozen, requiring no downstream labels during training, and uses only scalar perceptual feedback to navigate the agent toward informative viewpoints. Specially, we transform a vision-language model (VLM) into a low-level pose controller through a two-stage training pipeline: first fine-tuning it on rule-based exploration trajectories that systematically probe indoor scenes,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis