Reference-Guided Diffusion Inpainting For Multimodal Counterfactual Generation

Alexandru Buburuzan

arXiv:2507.23058·cs.CV·August 1, 2025

Reference-Guided Diffusion Inpainting For Multimodal Counterfactual Generation

Alexandru Buburuzan

PDF

Open Access

TL;DR

This paper introduces two novel diffusion-based inpainting methods, MObI and AnydoorMed, for generating realistic, controllable, and multimodal synthetic data in autonomous driving and medical imaging, enhancing safety-critical testing.

Contribution

The work presents the first multimodal object inpainting framework using diffusion models and extends it to medical imaging, enabling high-quality, reference-guided inpainting across diverse modalities.

Findings

01

MObI achieves realistic object insertion in multimodal scenes with spatial accuracy.

02

AnydoorMed effectively inpaints anomalies in mammography scans with structural and semantic consistency.

03

Both methods demonstrate adaptability of foundation models to different perceptual modalities.

Abstract

Safety-critical applications, such as autonomous driving and medical image analysis, require extensive multimodal data for rigorous testing. Synthetic data methods are gaining prominence due to the cost and complexity of gathering real-world data, but they demand a high degree of realism and controllability to be useful. This work introduces two novel methods for synthetic data generation in autonomous driving and medical image analysis, namely MObI and AnydoorMed, respectively. MObI is a first-of-its-kind framework for Multimodal Object Inpainting that leverages a diffusion model to produce realistic and controllable object inpaintings across perceptual modalities, demonstrated simultaneously for camera and lidar. Given a single reference RGB image, MObI enables seamless object insertion into existing multimodal scenes at a specified 3D location, guided by a bounding box, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Handwritten Text Recognition Techniques