Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles

Simon Bultmann; Jan Quenzel; Sven Behnke

arXiv:2108.06608·cs.CV·August 17, 2021

Real-Time Multi-Modal Semantic Fusion on Unmanned Aerial Vehicles

Simon Bultmann, Jan Quenzel, Sven Behnke

PDF

TL;DR

This paper presents a real-time UAV system that fuses multi-modal sensor data, including LiDAR, RGB, and thermal images, for semantic scene analysis, enabling fast autonomous decision-making in complex environments.

Contribution

It introduces a lightweight, real-time multi-modal semantic fusion system on UAVs using embedded inference accelerators and a late fusion approach for enhanced scene understanding.

Findings

01

Achieves approximately 9 Hz processing rate for semantic inference.

02

Successfully demonstrates real-world urban environment experiments.

03

Provides augmented semantic images and point clouds for improved scene analysis.

Abstract

Unmanned aerial vehicles (UAVs) equipped with multiple complementary sensors have tremendous potential for fast autonomous or remote-controlled semantic scene analysis, e.g., for disaster examination. In this work, we propose a UAV system for real-time semantic inference and fusion of multiple sensor modalities. Semantic segmentation of LiDAR scans and RGB images, as well as object detection on RGB and thermal images, run online onboard the UAV computer using lightweight CNN architectures and embedded inference accelerators. We follow a late fusion approach where semantic information from multiple modalities augments 3D point clouds and image segmentation masks while also generating an allocentric semantic map. Our system provides augmented semantic images and point clouds with $\approx$ 9Hz. We evaluate the integrated system in real-world experiments in an urban environment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.