CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer
Sicheng Wang, Hao Jiang, Lei Xiang

TL;DR
CT-MVSNet introduces an efficient multi-scale transformer architecture for multi-view stereo that captures intra-image and inter-image features effectively without high computational costs, leading to state-of-the-art results.
Contribution
The paper proposes a novel cross-scale transformer with adaptive matching-aware attention and dual-feature guided aggregation for improved depth estimation in multi-view stereo.
Findings
Achieves state-of-the-art results on DTU and Tanks and Temples datasets.
Reduces computational costs compared to existing transformer-based methods.
Enhances feature representation for more accurate depth estimation.
Abstract
Recent deep multi-view stereo (MVS) methods have widely incorporated transformers into cascade network for high-resolution depth estimation, achieving impressive results. However, existing transformer-based methods are constrained by their computational costs, preventing their extension to finer stages. In this paper, we propose a novel cross-scale transformer (CT) that processes feature representations at different stages without additional computation. Specifically, we introduce an adaptive matching-aware transformer (AMT) that employs different interactive attention combinations at multiple scales. This combined strategy enables our network to capture intra-image context information and enhance inter-image feature relationships. Besides, we present a dual-feature guided aggregation (DFGA) that embeds the coarse global semantic information into the finer cost volume construction to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Processing Techniques and Applications
