Learning Multi-modal Information for Robust Light Field Depth Estimation
Yongri Piao, Xinxin Ji, Miao Zhang, Yukun Zhang

TL;DR
This paper introduces a multi-modal learning approach that combines focal stack and RGB images with context reasoning and attention-guided fusion to improve the robustness and accuracy of light field depth estimation.
Contribution
It proposes a novel multi-modal framework with context reasoning and attention-guided fusion, addressing defocus blur issues in focal stack-based depth estimation.
Findings
Outperforms existing methods on two light field datasets.
Achieves superior depth estimation accuracy.
Demonstrates practical applicability on mobile phone data.
Abstract
Light field data has been demonstrated to facilitate the depth estimation task. Most learning-based methods estimate the depth infor-mation from EPI or sub-aperture images, while less methods pay attention to the focal stack. Existing learning-based depth estimation methods from the focal stack lead to suboptimal performance because of the defocus blur. In this paper, we propose a multi-modal learning method for robust light field depth estimation. We first excavate the internal spatial correlation by designing a context reasoning unit which separately extracts comprehensive contextual information from the focal stack and RGB images. Then we integrate the contextual information by exploiting a attention-guide cross-modal fusion module. Extensive experiments demonstrate that our method achieves superior performance than existing representative methods on two light field datasets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Image Enhancement Techniques
