Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Pulkit Tandon; Shubham Chandak; Pat Pataranutaporn; Yimeng Liu; Anesu; M. Mapuranga; Pattie Maes; Tsachy Weissman; Misha Sra

arXiv:2106.14014·eess.IV·April 5, 2022

Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Pulkit Tandon, Shubham Chandak, Pat Pataranutaporn, Yimeng Liu, Anesu, M. Mapuranga, Pattie Maes, Tsachy Weissman, Misha Sra

PDF

Open Access 1 Repo

TL;DR

Txt2Vid introduces a revolutionary video compression method that transforms talking-head videos into text, enabling ultra-low bitrate transmission and realistic reconstruction using deep learning, significantly reducing data rates while maintaining user experience.

Contribution

The paper presents a novel generative pipeline that compresses videos into text transcripts and reconstructs realistic videos, achieving 100-1000x bitrate reduction compared to standard codecs.

Findings

01

Achieves 2-3 orders of magnitude bitrate reduction.

02

Maintains equivalent user experience in subjective evaluations.

03

Enables video communication in low-bandwidth scenarios.

Abstract

Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure. In addition, the recent COVID-19 pandemic fueled a surge in the use of video conferencing tools. Since videos take up considerable bandwidth (~100 Kbps to a few Mbps), improved video compression can have a substantial impact on network performance for live and pre-recorded content, providing broader access to multimedia content worldwide. We present a novel video compression pipeline, called Txt2Vid, which dramatically reduces data transmission rates by compressing webcam videos ("talking-head videos") to a text transcript. The text is transmitted and decoded into a realistic reconstruction of the original video using recent advances in deep learning based voice cloning and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tpulkit/txt2vid
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Video Analysis and Summarization