3D Object Positioning Using Differentiable Multimodal Learning
Sean Zanyk-McLean, Krishna Kumar, Paul Navratil

TL;DR
This paper introduces a multimodal differentiable learning approach combining simulated Lidar and image data to improve 3D object positioning accuracy and convergence speed, with applications in autonomous vehicle scene understanding.
Contribution
It presents a novel fusion of Lidar and image modalities using differentiable rendering for faster and more accurate object position optimization.
Findings
Fusing Lidar with image data accelerates convergence.
The method improves accuracy of object positioning in simulated scenes.
Potential applications in autonomous vehicle perception systems.
Abstract
This article describes a multi-modal method using simulated Lidar data via ray tracing and image pixel loss with differentiable rendering to optimize an object's position with respect to an observer or some referential objects in a computer graphics scene. Object position optimization is completed using gradient descent with the loss function being influenced by both modalities. Typical object placement optimization is done using image pixel loss with differentiable rendering only, this work shows the use of a second modality (Lidar) leads to faster convergence. This method of fusing sensor input presents a potential usefulness for autonomous vehicles, as these methods can be used to establish the locations of multiple actors in a scene. This article also presents a method for the simulation of multiple types of data to be used in the training of autonomous vehicles.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Robotic Path Planning Algorithms · Advanced Vision and Imaging
