Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text
Pulkit Tandon, Shubham Chandak, Pat Pataranutaporn, Yimeng Liu, Anesu, M. Mapuranga, Pattie Maes, Tsachy Weissman, Misha Sra

TL;DR
Txt2Vid introduces a revolutionary video compression method that transforms talking-head videos into text, enabling ultra-low bitrate transmission and realistic reconstruction using deep learning, significantly reducing data rates while maintaining user experience.
Contribution
The paper presents a novel generative pipeline that compresses videos into text transcripts and reconstructs realistic videos, achieving 100-1000x bitrate reduction compared to standard codecs.
Findings
Achieves 2-3 orders of magnitude bitrate reduction.
Maintains equivalent user experience in subjective evaluations.
Enables video communication in low-bandwidth scenarios.
Abstract
Video represents the majority of internet traffic today, driving a continual race between the generation of higher quality content, transmission of larger file sizes, and the development of network infrastructure. In addition, the recent COVID-19 pandemic fueled a surge in the use of video conferencing tools. Since videos take up considerable bandwidth (~100 Kbps to a few Mbps), improved video compression can have a substantial impact on network performance for live and pre-recorded content, providing broader access to multimedia content worldwide. We present a novel video compression pipeline, called Txt2Vid, which dramatically reduces data transmission rates by compressing webcam videos ("talking-head videos") to a text transcript. The text is transmitted and decoded into a realistic reconstruction of the original video using recent advances in deep learning based voice cloning and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Video Analysis and Summarization
