Live2Diff: Live Stream Translation via Uni-directional Attention in   Video Diffusion Models

Zhening Xing; Gereon Fox; Yanhong Zeng; Xingang Pan; Mohamed Elgharib,; Christian Theobalt; Kai Chen

arXiv:2407.08701·cs.CV·July 12, 2024

Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models

Zhening Xing, Gereon Fox, Yanhong Zeng, Xingang Pan, Mohamed Elgharib,, Christian Theobalt, Kai Chen

PDF

Open Access 1 Models

TL;DR

Live2Diff introduces a novel video diffusion model with uni-directional attention tailored for live streaming video translation, ensuring temporal consistency and real-time performance.

Contribution

This work is the first to design a uni-directional attention-based video diffusion model specifically for live streaming, addressing the limitations of bi-directional models.

Findings

01

Outperforms previous methods in temporal smoothness

02

Achieves interactive framerates for streaming translation

03

Demonstrates effectiveness of uni-directional attention in videos

Abstract

Large Language Models have shown remarkable efficacy in generating streaming data such as text and audio, thanks to their temporally uni-directional attention mechanism, which models correlations between the current token and previous tokens. However, video streaming remains much less explored, despite a growing need for live video processing. State-of-the-art video diffusion models leverage bi-directional temporal attention to model the correlations between the current frame and all the surrounding (i.e. including future) frames, which hinders them from processing streaming videos. To address this problem, we present Live2Diff, the first attempt at designing a video diffusion model with uni-directional temporal attention, specifically targeting live streaming video translation. Compared to previous works, our approach ensures temporal consistency and smoothness by correlating the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Leoxing/Live2Diff
model· ♡ 13
♡ 13

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Image Processing Techniques

MethodsSoftmax · Attention Is All You Need · Diffusion