GenMM: Geometrically and Temporally Consistent Multimodal Data   Generation for Video and LiDAR

Bharat Singh; Viveka Kulharia; Luyu Yang; Avinash Ravichandran,; Ambrish Tyagi; Ashish Shrivastava

arXiv:2406.10722·cs.CV·June 18, 2024

GenMM: Geometrically and Temporally Consistent Multimodal Data Generation for Video and LiDAR

Bharat Singh, Viveka Kulharia, Luyu Yang, Avinash Ravichandran,, Ambrish Tyagi, Ashish Shrivastava

PDF

Open Access

TL;DR

GenMM is a novel method for creating consistent multimodal synthetic data by seamlessly inserting 3D objects into videos and LiDAR scans, ensuring geometric and temporal coherence for applications like autonomous driving.

Contribution

It introduces a joint editing framework that combines diffusion-based inpainting, semantic segmentation, and geometry optimization to insert objects into multimodal data with high consistency.

Findings

01

Effective insertion of 3D objects in videos and LiDAR scans

02

Maintains geometric and temporal consistency across modalities

03

Demonstrates improved realism and accuracy in synthetic data

Abstract

Multimodal synthetic data generation is crucial in domains such as autonomous driving, robotics, augmented/virtual reality, and retail. We propose a novel approach, GenMM, for jointly editing RGB videos and LiDAR scans by inserting temporally and geometrically consistent 3D objects. Our method uses a reference image and 3D bounding boxes to seamlessly insert and blend new objects into target videos. We inpaint the 2D Regions of Interest (consistent with 3D boxes) using a diffusion-based video inpainting model. We then compute semantic boundaries of the object and estimate it's surface depth using state-of-the-art semantic segmentation and monocular depth estimation techniques. Subsequently, we employ a geometry-based optimization algorithm to recover the 3D shape of the object's surface, ensuring it fits precisely within the 3D bounding box. Finally, LiDAR rays intersecting with the new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction · Human Motion and Animation

MethodsInpainting