Deep Laparoscopic Stereo Matching with Transformers
Xuelian Cheng, Yiran Zhong, Mehrtash Harandi, Tom Drummond, Zhiyong, Wang, and Zongyuan Ge

TL;DR
This paper explores the application of transformers in stereo matching for laparoscopic videos, proposing a hybrid CNN-transformer framework that improves accuracy and generalization in 3D reconstruction tasks.
Contribution
It introduces HybridStereoNet, a novel hybrid deep stereo matching framework combining CNNs and transformers, optimized for laparoscopic video analysis.
Findings
Transformers enhance feature representation learning in stereo matching.
The hybrid framework converges faster and achieves higher accuracy.
Extensive experiments show superior performance on multiple datasets.
Abstract
The self-attention mechanism, successfully employed with the transformer structure is shown promise in many computer vision tasks including image recognition, and object detection. Despite the surge, the use of the transformer for the problem of stereo matching remains relatively unexplored. In this paper, we comprehensively investigate the use of the transformer for the problem of stereo matching, especially for laparoscopic videos, and propose a new hybrid deep stereo matching framework (HybridStereoNet) that combines the best of the CNN and the transformer in a unified design. To be specific, we investigate several ways to introduce transformers to volumetric stereo matching pipelines by analyzing the loss landscape of the designs and in-domain/cross-domain accuracy. Our analysis suggests that employing transformers for feature representation learning, while using CNNs for cost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Advanced Neural Network Applications
