CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer

Sicheng Wang; Hao Jiang; Lei Xiang

arXiv:2312.08594·cs.CV·February 5, 2024·1 cites

CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer

Sicheng Wang, Hao Jiang, Lei Xiang

PDF

Open Access 1 Repo

TL;DR

CT-MVSNet introduces an efficient multi-scale transformer architecture for multi-view stereo that captures intra-image and inter-image features effectively without high computational costs, leading to state-of-the-art results.

Contribution

The paper proposes a novel cross-scale transformer with adaptive matching-aware attention and dual-feature guided aggregation for improved depth estimation in multi-view stereo.

Findings

01

Achieves state-of-the-art results on DTU and Tanks and Temples datasets.

02

Reduces computational costs compared to existing transformer-based methods.

03

Enhances feature representation for more accurate depth estimation.

Abstract

Recent deep multi-view stereo (MVS) methods have widely incorporated transformers into cascade network for high-resolution depth estimation, achieving impressive results. However, existing transformer-based methods are constrained by their computational costs, preventing their extension to finer stages. In this paper, we propose a novel cross-scale transformer (CT) that processes feature representations at different stages without additional computation. Specifically, we introduce an adaptive matching-aware transformer (AMT) that employs different interactive attention combinations at multiple scales. This combined strategy enables our network to capture intra-image context information and enhance inter-image feature relationships. Besides, we present a dual-feature guided aggregation (DFGA) that embeds the coarse global semantic information into the finer cost volume construction to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wscstrive/ct-mvsnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image Processing Techniques and Applications