TL;DR
This paper demonstrates that a carefully optimized simple CNN can outperform complex deep networks in video deblurring, emphasizing the importance of detailed model and training procedures.
Contribution
It reveals that meticulous attention to model and training details can significantly enhance simple CNN performance, challenging assumptions about complex models' superiority.
Findings
Simple CNN boosted by 3.15dB with detailed tuning
Careful training details can outperform complex models
Highlights importance of implementation details in model performance
Abstract
Video deblurring for hand-held cameras is a challenging task, since the underlying blur is caused by both camera shake and object motion. State-of-the-art deep networks exploit temporal information from neighboring frames, either by means of spatio-temporal transformers or by recurrent architectures. In contrast to these involved models, we found that a simple baseline CNN can perform astonishingly well when particular care is taken w.r.t. the details of model and training procedure. To that end, we conduct a comprehensive study regarding these crucial details, uncovering extreme differences in quantitative and qualitative performance. Exploiting these details allows us to boost the architecture and training procedure of a simple baseline CNN by a staggering 3.15dB, such that it becomes highly competitive w.r.t. cutting-edge networks. This raises the question whether the reported…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
