Looking Backward: Streaming Video-to-Video Translation with Feature   Banks

Feng Liang; Akio Kodaira; Chenfeng Xu; Masayoshi Tomizuka; Kurt; Keutzer; Diana Marculescu

arXiv:2405.15757·cs.CV·February 18, 2025

Looking Backward: Streaming Video-to-Video Translation with Feature Banks

Feng Liang, Akio Kodaira, Chenfeng Xu, Masayoshi Tomizuka, Kurt, Keutzer, Diana Marculescu

PDF

Open Access 1 Repo 1 Video

TL;DR

StreamV2V is a real-time streaming video-to-video translation model using a feature bank to incorporate past information, enabling efficient, continuous processing of unlimited frames with high temporal consistency.

Contribution

It introduces a novel streaming V2V translation method with a feature bank that maintains past frame information, supporting real-time processing without fine-tuning.

Findings

01

Achieves 20 FPS on a single GPU.

02

Outperforms prior methods by 15x to 158x in speed.

03

Maintains high temporal consistency in video translation.

Abstract

This paper introduces StreamV2V, a diffusion model that achieves real-time streaming video-to-video (V2V) translation with user prompts. Unlike prior V2V methods using batches to process limited frames, we opt to process frames in a streaming fashion, to support unlimited frames. At the heart of StreamV2V lies a backward-looking principle that relates the present to the past. This is realized by maintaining a feature bank, which archives information from past frames. For incoming frames, StreamV2V extends self-attention to include banked keys and values and directly fuses similar past features into the output. The feature bank is continually updated by merging stored and new features, making it compact but informative. StreamV2V stands out for its adaptability and efficiency, seamlessly integrating with image diffusion models without fine-tuning. It can run 20 FPS on one A100 GPU, being…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Jeff-LiangF/streamv2v
pytorchOfficial

Videos

Looking Backward: Streaming Video-to-Video Translation with Feature Banks· slideslive

Taxonomy

TopicsMultimedia Communication and Technology

MethodsOPT · Diffusion