Real-Time Monocular Scene Analysis for UAV in Outdoor Environments
Yara AlaaEldin

TL;DR
This thesis presents Co-SemDepth, a deep-learning model for real-time depth and semantic mapping from monocular cameras on UAVs, introduces a synthetic dataset, and analyzes synthetic-to-real generalization and style transfer techniques.
Contribution
It introduces Co-SemDepth for joint depth and semantic mapping, a new synthetic dataset TopAir, and provides an extensive analysis of domain adaptation and style transfer methods for UAV scene understanding.
Findings
Co-SemDepth outperforms in depth estimation and semantic segmentation.
Diffusion models excel in synthetic-to-real style transfer.
Co-SemDepth generalizes well to real marine data.
Abstract
In this thesis, we leverage monocular cameras on aerial robots to predict depth and semantic maps in low-altitude unstructured environments. We propose a joint deep-learning architecture, named Co-SemDepth, that can perform the two tasks accurately and rapidly, and validate its effectiveness on a variety of datasets. The training of neural networks requires an abundance of annotated data, and in the UAV field, the availability of such data is limited. We introduce a new synthetic dataset in this thesis, TopAir that contains images captured with a nadir view in outdoor environments at different altitudes, helping to fill the gap. While using synthetic data for the training is convenient, it raises issues when shifting to the real domain for testing. We conduct an extensive analytical study to assess the effect of several factors on the synthetic-to-real generalization. Co-SemDepth and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
