Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors

Sindhu B Hegde; Rudrabha Mukhopadhyay; Vinay P Namboodiri; C. V.; Jawahar

arXiv:2208.08118·cs.CV·August 18, 2022

Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors

Sindhu B Hegde, Rudrabha Mukhopadhyay, Vinay P Namboodiri, C. V., Jawahar

PDF

1 Repo

TL;DR

This paper presents a novel audio-visual upsampling network that transforms extremely low-resolution videos into high-quality, full-resolution talking-face videos, significantly improving super-resolution quality and video compression efficiency.

Contribution

The paper introduces an end-to-end multi-stage framework leveraging audio and image priors for extreme-scale talking-face video upsampling, achieving unprecedented scale and quality improvements.

Findings

01

Achieves 32x scaling from 8x8 to 256x256 resolution.

02

Improves FID score by 8x over previous super-resolution methods.

03

Provides a 3.5x bit/pixel reduction in talking-face video compression.

Abstract

In this paper, we explore an interesting question of what can be obtained from an $8 \times 8$ pixel video sequence. Surprisingly, it turns out to be quite a lot. We show that when we process this $8 \times 8$ video with the right set of audio and image priors, we can obtain a full-length, $256 \times 256$ video. We achieve this $32 \times$ scaling of an extremely low-resolution input using our novel audio-visual upsampling network. The audio prior helps to recover the elemental facial details and precise lip shapes and a single high-resolution target identity image prior provides us with rich appearance details. Our approach is an end-to-end multi-stage framework. The first stage produces a coarse intermediate output video that can be then used to animate single target identity image and generate realistic, accurate and high-quality outputs. Our approach is simple and performs exceedingly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Sindhu-Hegde/video-super-resolver
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLow-resolution input