GenMM: Geometrically and Temporally Consistent Multimodal Data Generation for Video and LiDAR
Bharat Singh, Viveka Kulharia, Luyu Yang, Avinash Ravichandran,, Ambrish Tyagi, Ashish Shrivastava

TL;DR
GenMM is a novel method for creating consistent multimodal synthetic data by seamlessly inserting 3D objects into videos and LiDAR scans, ensuring geometric and temporal coherence for applications like autonomous driving.
Contribution
It introduces a joint editing framework that combines diffusion-based inpainting, semantic segmentation, and geometry optimization to insert objects into multimodal data with high consistency.
Findings
Effective insertion of 3D objects in videos and LiDAR scans
Maintains geometric and temporal consistency across modalities
Demonstrates improved realism and accuracy in synthetic data
Abstract
Multimodal synthetic data generation is crucial in domains such as autonomous driving, robotics, augmented/virtual reality, and retail. We propose a novel approach, GenMM, for jointly editing RGB videos and LiDAR scans by inserting temporally and geometrically consistent 3D objects. Our method uses a reference image and 3D bounding boxes to seamlessly insert and blend new objects into target videos. We inpaint the 2D Regions of Interest (consistent with 3D boxes) using a diffusion-based video inpainting model. We then compute semantic boundaries of the object and estimate it's surface depth using state-of-the-art semantic segmentation and monocular depth estimation techniques. Subsequently, we employ a geometry-based optimization algorithm to recover the 3D shape of the object's surface, ensuring it fits precisely within the 3D bounding box. Finally, LiDAR rays intersecting with the new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction · Human Motion and Animation
MethodsInpainting
