# OD-MVSNet: Omni-dimensional dynamic multi-view stereo network

**Authors:** Ke Pan, Kefeng Li, Guangyuan Zhang, Zhenfang Zhu, Peng Wang, Zhenfei Wang, Chen Fu, Guangchen Li, Yuxuan Ding

PMC · DOI: 10.1371/journal.pone.0309029 · PLOS ONE · 2024-08-15

## TL;DR

This paper introduces OD-MVSNet, a new neural network that improves 3D scene reconstruction by enhancing depth estimation accuracy and efficiency.

## Contribution

The novel omni-dimensional dynamic atrous spatial pyramid pooling and normalization-based 3D attention module improve multi-view stereo performance.

## Key findings

- OD-MVSNet outperforms baseline models on DTU dataset with 1.4% lower accuracy loss.
- The model achieves 0.9% lower completeness loss and 1.2% lower overall loss.
- The proposed modules enable dense point cloud generation with reduced memory usage.

## Abstract

Multi-view stereo based on learning is a critical task in three-dimensional reconstruction, enabling the effective inference of depth maps and the reconstruction of fine-grained scene geometry. However, the results obtained by current popular 3D reconstruction methods are not precise, and achieving high-accuracy scene reconstruction remains challenging due to the pervasive impact of feature extraction and the poor correlation between cost and volume. In addressing these issues, we propose a cascade deep residual inference network to enhance the efficiency and accuracy of multi-view stereo depth estimation. This approach builds a cost-volume pyramid from coarse to fine, generating a lightweight, compact network to improve reconstruction results. Specifically, we introduce the omni-dimensional dynamic atrous spatial pyramid pooling (OSPP), a multiscale feature extraction module capable of generating dense feature maps with multiscale contextual information. The feature maps encoded by the OSPP module can generate dense point clouds without consuming significant memory. Furthermore, to alleviate the issue of feature mismatch in cost volume regularization, we propose a normalization-based 3D attention module. The 3D attention module aggregates crucial information within the cost volume across the dimensions of channel, spatial, and depth. Through extensive experiments on benchmark datasets, notably DTU, we found that the OD-MVSNet model outperforms the baseline model by approximately 1.4% in accuracy loss, 0.9% in completeness loss, and 1.2% in overall loss, demonstrating the effectiveness of our module.

## Full-text entities

- **Diseases:** MVS (MESH:D015161), N3DAM (MESH:D019292)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11326553/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11326553/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/PMC11326553/full.md

---
Source: https://tomesphere.com/paper/PMC11326553