Joint Depth Prediction and Semantic Segmentation with Multi-View SAM
Mykhailo Shvets, Dongxu Zhao, Marc Niethammer, Roni Sengupta,, Alexander C. Berg

TL;DR
This paper introduces a multi-view stereo method that leverages the Segment Anything Model's semantic features to improve depth prediction and semantic segmentation, outperforming existing single-view and monocular multi-task approaches.
Contribution
It presents a novel multi-view stereo technique that integrates SAM's semantic features to enhance depth and segmentation predictions in a joint framework.
Findings
Outperforms single-task MVS and segmentation models
Achieves better results than multi-task monocular methods
Demonstrates mutual benefits between depth and segmentation tasks
Abstract
Multi-task approaches to joint depth and segmentation prediction are well-studied for monocular images. Yet, predictions from a single-view are inherently limited, while multiple views are available in many robotics applications. On the other end of the spectrum, video-based and full 3D methods require numerous frames to perform reconstruction and segmentation. With this work we propose a Multi-View Stereo (MVS) technique for depth prediction that benefits from rich semantic features of the Segment Anything Model (SAM). This enhanced depth prediction, in turn, serves as a prompt to our Transformer-based semantic segmentation decoder. We report the mutual benefit that both tasks enjoy in our quantitative and qualitative studies on the ScanNet dataset. Our approach consistently outperforms single-task MVS and segmentation models, along with multi-task monocular methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Joint Depth Prediction and Semantic Segmentation With Multi-View SAM· youtube
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques
