Subjective Camera 1.0: Bridging Human Cognition and Visual Reconstruction through Sequence-Aware Sketch-Guided Diffusion

Haoyang Chen; Dongfang Sun; Caoyuan Ma; Shiqin Wang; Kewei Zhang; Zheng Wang; Zhixiang Wang

arXiv:2506.23711·cs.CV·September 23, 2025

Subjective Camera 1.0: Bridging Human Cognition and Visual Reconstruction through Sequence-Aware Sketch-Guided Diffusion

Haoyang Chen, Dongfang Sun, Caoyuan Ma, Shiqin Wang, Kewei Zhang, Zheng Wang, Zhixiang Wang

PDF

Open Access

TL;DR

This paper presents Subjective Camera 1.0, a novel framework that reconstructs real-world scenes from subjective descriptions and sketches, bridging human cognition and visual reconstruction without extensive training data.

Contribution

It introduces a sequence-aware diffusion-based method that effectively integrates textual and sketch inputs for scene reconstruction, addressing generalization and concept integration challenges.

Findings

01

Achieves state-of-the-art image quality and alignment

02

Outperforms existing methods in spatial and semantic accuracy

03

User studies favor the proposed approach

Abstract

We introduce the concept of a subjective camera to reconstruct meaningful moments that physical cameras fail to capture. We propose Subjective Camera 1.0, a framework for reconstructing real-world scenes from readily accessible subjective readouts, i.e., textual descriptions and progressively drawn rough sketches. Built on optimization-based alignment of diffusion models, our approach avoids large-scale paired training data and mitigates generalization issues. To address the challenge of integrating multiple abstract concepts in real-world scenarios, we design a Sequence-Aware Sketch-Guided Diffusion framework with three loss terms for concept-wise sequential optimization, following the natural order of subjective readouts. Experiments on two datasets demonstrate that our method achieves state-of-the-art performance in image quality as well as spatial and semantic alignment with target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques