TL;DR
PPMStereo introduces a novel pick-and-play memory module for dynamic stereo matching, effectively modeling long-term spatio-temporal consistency to improve depth estimation in stereo videos with high accuracy and efficiency.
Contribution
It proposes a two-stage pick-and-play memory construction module that balances long-range temporal modeling and computational efficiency for stereo matching.
Findings
Achieves state-of-the-art accuracy and temporal consistency.
Reduces computational costs compared to previous methods.
Improves depth estimation in stereo videos significantly.
Abstract
Temporally consistent depth estimation from stereo video is critical for real-world applications such as augmented reality, where inconsistent depth estimation disrupts the immersion of users. Despite its importance, this task remains challenging due to the difficulty in modeling long-term temporal consistency in a computationally efficient manner. Previous methods attempt to address this by aggregating spatio-temporal information but face a fundamental trade-off: limited temporal modeling provides only modest gains, whereas capturing long-range dependencies significantly increases computational cost. To address this limitation, we introduce a memory buffer for modeling long-range spatio-temporal consistency while achieving efficient dynamic stereo matching. Inspired by the two-stage decision-making process in humans, we propose a \textbf{P}ick-and-\textbf{P}lay \textbf{M}emory (PPM)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
