Adaptive Window Pruning for Efficient Local Motion Deblurring
Haoying Li, Jixin Zhao, Shangchen Zhou, Huajun Feng, Chongyi Li, Chen, Change Loy

TL;DR
This paper introduces an adaptive window pruning Transformer for local motion deblurring that selectively processes blurred regions, significantly reducing computation and improving deblurring quality in high-resolution images.
Contribution
It proposes AdaWPT, a novel adaptive window pruning mechanism within a Transformer for efficient local motion deblurring, guided by a learned confidence predictor.
Findings
Reduces FLOPs by 66%
More than doubles inference speed
Achieves superior perceptual and quantitative deblurring results
Abstract
Local motion blur commonly occurs in real-world photography due to the mixing between moving objects and stationary backgrounds during exposure. Existing image deblurring methods predominantly focus on global deblurring, inadvertently affecting the sharpness of backgrounds in locally blurred images and wasting unnecessary computation on sharp pixels, especially for high-resolution images. This paper aims to adaptively and efficiently restore high-resolution locally blurred images. We propose a local motion deblurring vision Transformer (LMD-ViT) built on adaptive window pruning Transformer blocks (AdaWPT). To focus deblurring on local regions and reduce computation, AdaWPT prunes unnecessary windows, only allowing the active windows to be involved in the deblurring processes. The pruning operation relies on the blurriness confidence predicted by a confidence predictor that is trained…
Peer Reviews
Decision·ICLR 2024 poster
An adaptive window pruning strategy is adopted to focus the network computation on localized regions affected by blur and speed up the Transformer layers. A carefully annotated local blur mask is proposed for the ReLoBlur dataset to improve the performance of local deblurring methods.
The organization of the paper can be improved. 1) The methodology (Sec. 2) consists of too many (unnecessary) acronyms. Moreover, there are some inconsistencies when citing previous works (for example, LBAG (Li et al., 2023), LBFMG (Li et al., 2023), etc.). It would be better for the submission would strongly benefit from polishing the writing. The settings of the experiments need more explanation. 2) It is not clear why the GoPro dataset is used for training along with the ReLoBlur training
1. This paper addressed problems of single image local motion deblurring, which is very essential in today’s photography industry. The presented method is the first to apply sparse ViT in single image deblurring and may inspire the community to enhance image quality locally. 2. The proposed pruning strategy including the supervised confidence predictor, the differential decision layer and pruning losses are reasonable and practical. It combines window pruning strategy with Transformer layers, on
1. The authors did not mention whether the presented method LMD-ViT requires blur mask annotation during inference. This is crucial because if the method does require blur masks before inference, it would be helpful to provide instructions on how to generate them beforehand and assess their practicality. 2. The proposed method uses Gumble-Softmax as the decision layer in training and Softmax in inference. The equivalence of the two techniques in training and inference is not discussed. 3. In the
Overall, I think this paper has a novel idea and achieves good results in terms of performance and efficiency. I tend to accept this paper due to following reasons. 1. adaptive window pruning that saves computation on unnecessary attention windows. 2. bluriness confidence prediction that works well for both local motion prediction and global motion prediction 3. annotated local blur masks on ReLoBlur 4. well-designed experiments, well-presented figures and well-written paper.
Below are some concerns and suggestions. 1. Since the confidence predictor only uses MLP layers. How many pixels did you shift? Is the feature shift necessary to enlarge the receptive field of the neighbourhood? 2. What is the mask prediction accuracy on validation set? 3. How did you decide the border when annotating the masks for blurry moving objects? 4. If a patch is always abandoned, how is it processed? What layers it will be passed into during inference? 5. It could be better to pro
Videos
Taxonomy
TopicsAdvanced Image Processing Techniques · Image Processing Techniques and Applications · Image and Signal Denoising Methods
MethodsMulti-Head Attention · Attention Is All You Need · Pruning · Absolute Position Encodings · Linear Layer · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Adam
