UHD Image Dehazing via anDehazeFormer with Atmospheric-aware KV Cache
Pu Wang, Pengwen Dai, Chen Wu, Yeying Jin, Dianjie Lu, Guijuan Zhang, Youshan Zhang, Zhuoran Zheng

TL;DR
This paper introduces an efficient transformer-based framework for UHD image dehazing that significantly accelerates training and reduces memory use while maintaining high dehazing quality, enabling real-time processing of ultra-high-resolution images.
Contribution
It proposes an adaptive normalization and atmospheric-aware KV caching mechanism, improving training speed and efficiency for UHD image dehazing with state-of-the-art results.
Findings
5x faster training convergence
Real-time processing of 50 UHD images/sec
Maintains state-of-the-art dehazing quality
Abstract
In this paper, we propose an efficient visual transformer framework for ultra-high-definition (UHD) image dehazing that addresses the key challenges of slow training speed and high memory consumption for existing methods. Our approach introduces two key innovations: 1) an \textbf{a}daptive \textbf{n}ormalization mechanism inspired by the nGPT architecture that enables ultra-fast and stable training with a network with a restricted range of parameter expressions; and 2) we devise an atmospheric scattering-aware KV caching mechanism that dynamically optimizes feature preservation based on the physical haze formation model. The proposed architecture improves the training convergence speed by \textbf{5 } while reducing memory overhead, enabling real-time processing of 50 high-resolution images per second on an RTX4090 GPU. Experimental results show that our approach maintains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Advanced Image Processing Techniques · Digital Media Forensic Detection
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
