Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models
Zhening Xing, Gereon Fox, Yanhong Zeng, Xingang Pan, Mohamed Elgharib,, Christian Theobalt, Kai Chen

TL;DR
Live2Diff introduces a novel video diffusion model with uni-directional attention tailored for live streaming video translation, ensuring temporal consistency and real-time performance.
Contribution
This work is the first to design a uni-directional attention-based video diffusion model specifically for live streaming, addressing the limitations of bi-directional models.
Findings
Outperforms previous methods in temporal smoothness
Achieves interactive framerates for streaming translation
Demonstrates effectiveness of uni-directional attention in videos
Abstract
Large Language Models have shown remarkable efficacy in generating streaming data such as text and audio, thanks to their temporally uni-directional attention mechanism, which models correlations between the current token and previous tokens. However, video streaming remains much less explored, despite a growing need for live video processing. State-of-the-art video diffusion models leverage bi-directional temporal attention to model the correlations between the current frame and all the surrounding (i.e. including future) frames, which hinders them from processing streaming videos. To address this problem, we present Live2Diff, the first attempt at designing a video diffusion model with uni-directional temporal attention, specifically targeting live streaming video translation. Compared to previous works, our approach ensures temporal consistency and smoothness by correlating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques
MethodsSoftmax · Attention Is All You Need · Diffusion
