Adaptation and Attention for Neural Video Coding
Nannan Zou, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed, R. Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu

TL;DR
This paper introduces a novel neural video codec that leverages adaptation and attention mechanisms, achieving superior compression performance compared to existing traditional and learned codecs.
Contribution
The work presents architectural and training innovations, including adaptive motion estimation, a new neural block combining split-attention and DenseNet concepts, and decoder-side parameter overfitting.
Findings
Outperforms the 2021 CLIC top learned codec E2E_T_OL
Outperforms traditional VVC/H.266 in some settings
Demonstrates coding gains through ablation studies
Abstract
Neural image coding represents now the state-of-the-art image compression approach. However, a lot of work is still to be done in the video domain. In this work, we propose an end-to-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention. Our codec is organized as an intra-frame codec paired with an inter-frame codec. As one architectural novelty, we propose to train the inter-frame codec model to adapt the motion estimation process based on the resolution of the input video. A second architectural novelty is a new neural block that combines concepts from split-attention based neural networks and from DenseNets. Finally, we propose to overfit a set of decoder-side multiplicative parameters at inference time. Through ablation studies and comparisons to prior art, we show the benefits…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods · Advanced Vision and Imaging
