Reference-Guided Diffusion Inpainting For Multimodal Counterfactual Generation
Alexandru Buburuzan

TL;DR
This paper introduces two novel diffusion-based inpainting methods, MObI and AnydoorMed, for generating realistic, controllable, and multimodal synthetic data in autonomous driving and medical imaging, enhancing safety-critical testing.
Contribution
The work presents the first multimodal object inpainting framework using diffusion models and extends it to medical imaging, enabling high-quality, reference-guided inpainting across diverse modalities.
Findings
MObI achieves realistic object insertion in multimodal scenes with spatial accuracy.
AnydoorMed effectively inpaints anomalies in mammography scans with structural and semantic consistency.
Both methods demonstrate adaptability of foundation models to different perceptual modalities.
Abstract
Safety-critical applications, such as autonomous driving and medical image analysis, require extensive multimodal data for rigorous testing. Synthetic data methods are gaining prominence due to the cost and complexity of gathering real-world data, but they demand a high degree of realism and controllability to be useful. This work introduces two novel methods for synthetic data generation in autonomous driving and medical image analysis, namely MObI and AnydoorMed, respectively. MObI is a first-of-its-kind framework for Multimodal Object Inpainting that leverages a diffusion model to produce realistic and controllable object inpaintings across perceptual modalities, demonstrated simultaneously for camera and lidar. Given a single reference RGB image, MObI enables seamless object insertion into existing multimodal scenes at a specified 3D location, guided by a bounding box, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Handwritten Text Recognition Techniques
