TL;DR
This paper introduces a novel self-supervised approach for deep stereo network training by distilling knowledge from a monocular completion network, leading to improved accuracy and robustness over existing methods.
Contribution
It proposes reversing the typical stereo supervision paradigm by using monocular cues and sparse points for training deep stereo networks, enhancing performance and generalization.
Findings
Outperforms existing self-supervised stereo methods on popular datasets.
Demonstrates strong generalization to domain shifts.
Utilizes a consensus mechanism for dense disparity estimation.
Abstract
In many fields, self-supervised learning solutions are rapidly evolving and filling the gap with supervised approaches. This fact occurs for depth estimation based on either monocular or stereo, with the latter often providing a valid source of self-supervision for the former. In contrast, to soften typical stereo artefacts, we propose a novel self-supervised paradigm reversing the link between the two. Purposely, in order to train deep stereo networks, we distill knowledge through a monocular completion network. This architecture exploits single-image clues and few sparse points, sourced by traditional stereo algorithms, to estimate dense yet accurate disparity maps by means of a consensus mechanism over multiple estimations. We thoroughly evaluate with popular stereo datasets the impact of different supervisory signals showing how stereo networks trained with our paradigm outperform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
