GPS as a Control Signal for Image Generation
Chao Feng, Ziyang Chen, Aleksander Holynski, Alexei A. Efros, Andrew, Owens

TL;DR
This paper demonstrates that GPS metadata can be used as a control signal for generating location-specific images and 3D reconstructions, enhancing the realism and spatial accuracy of generated visuals.
Contribution
It introduces GPS-conditioned image generation models, including diffusion models and 3D reconstructions, showing how GPS data improves spatially-aware image synthesis.
Findings
GPS conditioning captures neighborhood-specific appearances
Improves accuracy of 3D structure estimation
Enables fine-grained cityscape image generation
Abstract
We show that the GPS tags contained in photo metadata provide a useful control signal for image generation. We train GPS-to-image models and use them for tasks that require a fine-grained understanding of how images vary within a city. In particular, we train a diffusion model to generate images conditioned on both GPS and text. The learned model generates images that capture the distinctive appearance of different neighborhoods, parks, and landmarks. We also extract 3D models from 2D GPS-to-image models through score distillation sampling, using GPS conditioning to constrain the appearance of the reconstruction from each viewpoint. Our evaluations suggest that our GPS-conditioned models successfully learn to generate images that vary based on location, and that GPS conditioning improves estimated 3D structure.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInertial Sensor and Navigation
MethodsDiffusion · Greedy Policy Search
