TL;DR
This paper presents a CNN-based approach for visual ego-motion estimation in fast MAV maneuvers, demonstrating improved accuracy and speed over traditional methods, especially when aided by IMU data and self-supervised training.
Contribution
It introduces a lightweight CNN architecture optimized for fast inference on MAVs, with novel training strategies and evaluation on real flight data showing superior performance.
Findings
Better accuracy during fast maneuvers compared to traditional methods
Self-supervised learning outperforms supervised training
High inference speed (~10 ms) suitable for real-time MAV applications
Abstract
In the field of visual ego-motion estimation for Micro Air Vehicles (MAVs), fast maneuvers stay challenging mainly because of the big visual disparity and motion blur. In the pursuit of higher robustness, we study convolutional neural networks (CNNs) that predict the relative pose between subsequent images from a fast-moving monocular camera facing a planar scene. Aided by the Inertial Measurement Unit (IMU), we mainly focus on translational motion. The networks we study have similar small model sizes (around 1.35MB) and high inference speeds (around 10 milliseconds on a mobile GPU). Images for training and testing have realistic motion blur. Departing from a network framework that iteratively warps the first image to match the second with cascaded network blocks, we study different network architectures and training strategies. Simulated datasets and a self-collected MAV flight dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
