# WeatherMono: A CNN-Transformer Architecture for Self-Supervised Monocular Depth Estimation in Rainy and Foggy Conditions

**Authors:** Yongsheng Qiu

PMC · DOI: 10.3390/s26051705 · Sensors (Basel, Switzerland) · 2026-03-08

## TL;DR

This paper introduces WeatherMono, a new CNN-Transformer model that improves depth estimation accuracy in rainy and foggy weather conditions.

## Contribution

The novel CNN-Transformer architecture with MDC and GLFI modules enhances depth estimation in adverse weather.

## Key findings

- WeatherMono outperforms existing methods on the WeatherKITTI dataset with an AbsRel of 0.097.
- It achieves AbsRel of 0.149 in rain and 0.101 in fog on the DrivingStereo dataset.
- The model shows improved accuracy and robustness in low-contrast and blurry images.

## Abstract

In rainy and foggy conditions, the scattering of light and the occlusion effects of atmospheric particles distort the reflected light from object surfaces, leading to inconsistent depth information. As a result, depth estimation models trained under clear weather conditions fail to generalize effectively to adverse weather conditions. To address this challenge, we propose a novel CNN-Transformer architecture, WeatherMono, for self-supervised monocular depth estimation under rainy and foggy weather. Rainy and foggy images often contain large regions of low contrast and blurry features. By combining Convolutional Neural Networks (CNNs) with Transformers, WeatherMono effectively captures both local and global contextual information, thus improving depth estimation accuracy. Specifically, we introduce a Multi-Scale Deformable Convolution (MDC) module and a Global-Local Feature Interaction (GLFI) module. The MDC module extracts detailed local features in rainy and foggy environments, while the GLFI module incorporates an efficient multi-head attention mechanism into the Transformer encoder, enabling more effective capture of both local and global information. This enhances the model’s ability to comprehend image features, strengthens its capability to handle low-contrast and blurry images, and ultimately improves the accuracy of depth estimation in adverse weather conditions. Experiments on WeatherKITTI show WeatherMono achieves AbsRel of 0.097, outperforming WeatherDepth (0.104) and RoboDepth (0.107). On DrivingStereo, it achieves AbsRel of 0.149 (rain) and 0.101 (fog). Extensive qualitative and quantitative experiments demonstrate that WeatherMono significantly outperforms existing methods in terms of both accuracy and robustness under rainy and foggy conditions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12986829/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12986829/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12986829/full.md

---
Source: https://tomesphere.com/paper/PMC12986829