DiffPlace: Street View Generation via Place-Controllable Diffusion Model Enhancing Place Recognition

Ji Li; Zhiwei Li; Shihao Li; Zhenjiang Yu; Boyang Wang; Haiou Liu

arXiv:2602.11875·cs.CV·February 13, 2026

DiffPlace: Street View Generation via Place-Controllable Diffusion Model Enhancing Place Recognition

Ji Li, Zhiwei Li, Shihao Li, Zhenjiang Yu, Boyang Wang, Haiou Liu

PDF

Open Access

TL;DR

DiffPlace introduces a place-controllable diffusion framework that generates consistent, multi-view street scenes from text and other inputs, significantly improving place recognition for autonomous driving.

Contribution

We propose DiffPlace, a novel diffusion-based model with a place-ID controller that enables flexible, place-aware street view synthesis from multiple modalities.

Findings

01

Outperforms existing methods in image quality and place recognition support

02

Enables consistent background and flexible foreground modifications

03

Enhances training data for autonomous driving applications

Abstract

Generative models have advanced significantly in realistic image synthesis, with diffusion models excelling in quality and stability. Recent multi-view diffusion models improve 3D-aware street view generation, but they struggle to produce place-aware and background-consistent urban scenes from text, BEV maps, and object bounding boxes. This limits their effectiveness in generating realistic samples for place recognition tasks. To address these challenges, we propose DiffPlace, a novel framework that introduces a place-ID controller to enable place-controllable multi-view image generation. The place-ID controller employs linear projection, perceiver transformer, and contrastive learning to map place-ID embeddings into a fixed CLIP space, allowing the model to synthesize images with consistent background buildings while flexibly modifying foreground objects and weather conditions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Multimodal Machine Learning Applications