Look Wider to Match Image Patches with Convolutional Neural Networks
Haesol Park, Kyoung Mu Lee

TL;DR
This paper introduces a novel convolutional neural network approach with a per-pixel pyramid-pooling layer to improve stereo matching by effectively utilizing large contextual areas without losing resolution, leading to robust performance.
Contribution
It proposes a new large-area matching cost function using a per-pixel pyramid-pooling layer, enhancing stereo matching accuracy and robustness over traditional methods.
Findings
Achieves near-peak performance on Middlebury benchmark.
Robust against weak textures, depth discontinuities, and illumination differences.
Effectively utilizes large contextual information without resolution loss.
Abstract
When a human matches two images, the viewer has a natural tendency to view the wide area around the target pixel to obtain clues of right correspondence. However, designing a matching cost function that works on a large window in the same way is difficult. The cost function is typically not intelligent enough to discard the information irrelevant to the target pixel, resulting in undesirable artifacts. In this paper, we propose a novel learn a stereo matching cost with a large-sized window. Unlike conventional pooling layers with strides, the proposed per-pixel pyramid-pooling layer can cover a large area without a loss of resolution and detail. Therefore, the learned matching cost function can successfully utilize the information from a large area without introducing the fattening effect. The proposed method is robust despite the presence of weak textures, depth discontinuity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
