# A Novel Method for Monocular Depth Estimation Using an Hourglass Neck Module

**Authors:** Seung-Jin Oh, Seung-Ho Lee

PMC · DOI: 10.3390/s24041312 · Sensors (Basel, Switzerland) · 2024-02-18

## TL;DR

This paper introduces a new method for estimating depth from a single image using a modified neural network architecture called the hourglass neck module.

## Contribution

The novel approach combines a Swin Transformer V2 with an hourglass neck module and deformable attention for efficient and accurate monocular depth estimation.

## Key findings

- The proposed method achieved an RMSE of 0.274 on the NYU Depth V2 dataset.
- The method outperformed existing techniques in terms of depth estimation accuracy.

## Abstract

In this paper, we propose a novel method for monocular depth estimation using the hourglass neck module. The proposed method has the following originality. First, feature maps are extracted from Swin Transformer V2 using a masked image modeling (MIM) pretrained model. Since Swin Transformer V2 has a different patch size for each attention stage, it is easier to extract local and global features from images input by the vision transformer (ViT)-based encoder. Second, to maintain the polymorphism and local inductive bias of the feature map extracted from Swin Transformer V2, a feature map is input into the hourglass neck module. Third, deformable attention can be used at the waist of the hourglass neck module to reduce the computation cost and highlight the locality of the feature map. Finally, the feature map traverses the neck and proceeds through a decoder, comprised of a deconvolution layer and an upsampling layer, to generate a depth image. To evaluate the objective reliability of the proposed method in this paper, we used the NYU Depth V2 dataset to compare and evaluate the methods published in other papers. As a result of the experiment, the RMSE value of the novel method for monocular depth estimation using the hourglass neck module proposed in this paper was 0.274, which was lower than those published in other papers. The lower the RMSE value, the better the depth estimation method; therefore, its efficiency compared to other techniques has been proven.

## Full-text entities

- **Diseases:** HAHI (MESH:D000081042), injury to people or property (MESH:C000719191)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC10892898/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10892898/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/PMC10892898/full.md

---
Source: https://tomesphere.com/paper/PMC10892898