MEVG: Multi-event Video Generation with Text-to-Video Models

Gyeongrok Oh; Jaehwan Jeong; Sieun Kim; Wonmin Byeon; Jinkyu Kim,; Sungwoong Kim; Sangpil Kim

arXiv:2312.04086·cs.CV·July 17, 2024·1 cites

MEVG: Multi-event Video Generation with Text-to-Video Models

Gyeongrok Oh, Jaehwan Jeong, Sieun Kim, Wonmin Byeon, Jinkyu Kim,, Sungwoong Kim, Sangpil Kim

PDF

Open Access

TL;DR

This paper presents MEVG, a diffusion-based method for generating multi-event videos from multiple text inputs without fine-tuning, ensuring temporal coherence and semantic accuracy through novel diffusion processes and prompt generation.

Contribution

The paper introduces a new diffusion-based approach that generates multi-event videos from text without requiring large datasets or fine-tuning, using a last frame-aware diffusion process and a prompt generator.

Findings

01

Outperforms existing models in temporal coherence and semantic accuracy

02

Maintains global appearance across frames through iterative latent updates

03

Effective in generating videos with multiple distinct events

Abstract

We introduce a novel diffusion-based video generation method, generating a video showing multiple events given multiple individual sentences from the user. Our method does not require a large-scale video dataset since our method uses a pre-trained diffusion-based text-to-video generative model without a fine-tuning process. Specifically, we propose a last frame-aware diffusion process to preserve visual coherence between consecutive videos where each video consists of different events by initializing the latent and simultaneously adjusting noise in the latent to enhance the motion dynamic in a generated video. Furthermore, we find that the iterative update of latent vectors by referring to all the preceding frames maintains the global appearance across the frames in a video clip. To handle dynamic text input for video generation, we utilize a novel prompt generator that transfers course…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Human Motion and Animation

MethodsDiffusion · Attentive Walk-Aggregating Graph Neural Network