TL;DR
This paper introduces MRDAC, a novel multi-reference face video compression method using contrastive learning, which reduces reconstruction drift and improves quality in low-bitrate video conferencing.
Contribution
It proposes a contrastive learning-based multi-reference animation framework to enhance face video compression, addressing drift issues and enabling longer sequence compression with fewer references.
Findings
Significant coding and reconstruction quality improvements over previous GFVC methods.
Enhanced animation quality with large pose and facial expression changes.
Effective reduction of reconstruction drift in bi-directional prediction mode.
Abstract
Generative face video coding (GFVC) has been demonstrated as a potential approach to low-latency, low bitrate video conferencing. GFVC frameworks achieve an extreme gain in coding efficiency with over 70% bitrate savings when compared to conventional codecs at bitrates below 10kbps. In recent MPEG/JVET standardization efforts, all the information required to reconstruct video sequences using GFVC frameworks are adopted as part of the supplemental enhancement information (SEI) in existing compression pipelines. In light of this development, we aim to address a challenge that has been weakly addressed in prior GFVC frameworks, i.e., reconstruction drift as the distance between the reference and target frames increases. This challenge creates the need to update the reference buffer more frequently by transmitting more Intra-refresh frames, which are the most expensive element of the GFVC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
