Accelerating the Training of Video Super-Resolution Models
Lijian Lin, Xintao Wang, Zhongang Qi, Ying Shan

TL;DR
This paper introduces a multigrid training strategy for video super-resolution models that gradually increases spatial and temporal sizes, significantly speeding up training without sacrificing accuracy.
Contribution
It proposes a novel staged training approach that accelerates VSR model training by using smaller sizes initially and gradually increasing them, enabling faster convergence.
Findings
Achieves up to 6.2x speedup in training time
Maintains comparable performance to traditional training methods
Effective for various VSR models
Abstract
Despite that convolution neural networks (CNN) have recently demonstrated high-quality reconstruction for video super-resolution (VSR), efficiently training competitive VSR models remains a challenging problem. It usually takes an order of magnitude more time than training their counterpart image models, leading to long research cycles. Existing VSR methods typically train models with fixed spatial and temporal sizes from beginning to end. The fixed sizes are usually set to large values for good performance, resulting to slow training. However, is such a rigid training strategy necessary for VSR? In this work, we show that it is possible to gradually train video models from small to large spatial/temporal sizes, i.e., in an easy-to-hard manner. In particular, the whole training is divided into several stages and the earlier stage has smaller training spatial shape. Inside each stage,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Image Processing Techniques · Image Processing Techniques and Applications · Photoacoustic and Ultrasonic Imaging
MethodsConvolution
