Affine Transformation-Based Deep Frame Prediction
Hyomin Choi, Ivan V. Baji\'c

TL;DR
This paper introduces a neural network for deep frame prediction using affine transformations and adaptive filters, achieving smaller size and higher accuracy, leading to significant bit savings in video coding.
Contribution
The proposed model uniquely combines affine transformation estimation with adaptive spatial filters, reducing complexity and improving prediction accuracy in neural video coding.
Findings
Achieves approximately 7.3% bit savings in Low delay P configuration.
Outperforms prior neural networks in size and accuracy.
Integrates seamlessly with HEVC, enhancing compression efficiency.
Abstract
We propose a neural network model to estimate the current frame from two reference frames, using affine transformation and adaptive spatially-varying filters. The estimated affine transformation allows for using shorter filters compared to existing approaches for deep frame prediction. The predicted frame is used as a reference for coding the current frame. Since the proposed model is available at both encoder and decoder, there is no need to code or transmit motion information for the predicted frame. By making use of dilated convolutions and reduced filter length, our model is significantly smaller, yet more accurate, than any of the neural networks in prior works on this topic. Two versions of the proposed model - one for uni-directional, and one for bi-directional prediction - are trained using a combination of Discrete Cosine Transform (DCT)-based l1-loss with various transform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
