TalkingMachines: Real-Time Audio-Driven FaceTime-Style Video via Autoregressive Diffusion Models
Chetwin Low, Weimin Wang

TL;DR
TalkingMachines is a real-time, audio-driven video synthesis framework that transforms pretrained models into natural conversational character animators, enabling seamless, high-quality video streaming driven by audio inputs.
Contribution
It adapts a pretrained image-to-video model into an audio-driven avatar generator and introduces efficient inference techniques for real-time performance.
Findings
Achieves real-time, high-quality audio-driven video synthesis
Enables infinite video streaming without error accumulation
Optimizes inference pipeline for low latency and high throughput
Abstract
In this paper, we present TalkingMachines -- an efficient framework that transforms pretrained video generation models into real-time, audio-driven character animators. TalkingMachines enables natural conversational experiences by integrating an audio large language model (LLM) with our video generation foundation model. Our primary contributions include: (1) We adapt a pretrained SOTA image-to-video DiT into an audio-driven avatar generation model of 18 billion parameters; (2) We enable infinite video streaming without error accumulation through asymmetric knowledge distillation from a bidirectional teacher model into a sparse causal, autoregressive student model; (3) We design a high-throughput, low-latency inference pipeline incorporating several key engineering optimizations such as: (a) disaggregation of the DiT and VAE decoder across separate devices, (b) efficient overlap of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis
MethodsKnowledge Distillation
