CrossModalityDiffusion: Multi-Modal Novel View Synthesis with Unified Intermediate Representation
Alex Berian, Daniel Brignac, JhihYang Wu, Natnael Daba, Abhijit, Mahalanobis

TL;DR
CrossModalityDiffusion is a unified framework that synthesizes multi-modal images from novel viewpoints without prior scene geometry knowledge, leveraging modality-specific encoders and volumetric rendering for consistent scene understanding.
Contribution
It introduces a modular, geometry-aware diffusion-based approach for cross-modality view synthesis without requiring ground truth scene geometry.
Findings
Effective in generating accurate multi-modal views
Ensures consistent geometric understanding across modalities
Validated on synthetic ShapeNet dataset
Abstract
Geospatial imaging leverages data from diverse sensing modalities-such as EO, SAR, and LiDAR, ranging from ground-level drones to satellite views. These heterogeneous inputs offer significant opportunities for scene understanding but present challenges in interpreting geometry accurately, particularly in the absence of precise ground truth data. To address this, we propose CrossModalityDiffusion, a modular framework designed to generate images across different modalities and viewpoints without prior knowledge of scene geometry. CrossModalityDiffusion employs modality-specific encoders that take multiple input images and produce geometry-aware feature volumes that encode scene structure relative to their input camera positions. The space where the feature volumes are placed acts as a common ground for unifying input modalities. These feature volumes are overlapped and rendered into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Robotics and Sensor-Based Localization
MethodsDiffusion
