Learning Monocular Depth from Events via Egomotion Compensation
Haitao Meng, Chonghao Zhong, Sheng Tang, Lian JunJia, Wenwei Lin,, Zhenshan Bing, Yi Chang, Gang Chen, Alois Knoll

TL;DR
This paper introduces an interpretable monocular depth estimation framework using event cameras, leveraging physical motion principles and novel modules to improve accuracy and robustness over existing black-box methods.
Contribution
It proposes a physics-inspired depth estimation approach with new modules for focus discrimination and cost aggregation, enhancing interpretability and performance.
Findings
Outperforms state-of-the-art methods by up to 10% in accuracy.
Effectively utilizes physical motion principles for depth estimation.
Demonstrates robustness in real-world and synthetic datasets.
Abstract
Event cameras are neuromorphically inspired sensors that sparsely and asynchronously report brightness changes. Their unique characteristics of high temporal resolution, high dynamic range, and low power consumption make them well-suited for addressing challenges in monocular depth estimation (e.g., high-speed or low-lighting conditions). However, current existing methods primarily treat event streams as black-box learning systems without incorporating prior physical principles, thus becoming over-parameterized and failing to fully exploit the rich temporal information inherent in event camera data. To address this limitation, we incorporate physical motion principles to propose an interpretable monocular depth estimation framework, where the likelihood of various depth hypotheses is explicitly determined by the effect of motion compensation. To achieve this, we propose a Focus Cost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications
MethodsFocus
