Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal   LLMs

Dabing Cheng; Haosen Zhan; Xingchen Zhao; Guisheng Liu; Zemin Li,; Jinghui Xie; Zhao Song; Weiguo Feng; Bingyue Peng

arXiv:2501.05884·cs.CV·January 13, 2025

Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal LLMs

Dabing Cheng, Haosen Zhan, Xingchen Zhao, Guisheng Liu, Zemin Li,, Jinghui Xie, Zhao Song, Weiguo Feng, Bingyue Peng

PDF

Open Access

TL;DR

This paper presents a novel end-to-end framework using multimodal large language models for controllable, text-guided video editing, significantly improving efficiency and accuracy in short-video content creation.

Contribution

It introduces a new text-to-edit mechanism combined with a dense frame rate and slow-fast processing to enhance video understanding and editing control.

Findings

01

Effective in advertising datasets

02

Generalizes well to public datasets

03

Enhances video editing quality and controllability

Abstract

The exponential growth of short-video content has ignited a surge in the necessity for efficient, automated solutions to video editing, with challenges arising from the need to understand videos and tailor the editing according to user requirements. Addressing this need, we propose an innovative end-to-end foundational framework, ultimately actualizing precise control over the final video content editing. Leveraging the flexibility and generalizability of Multimodal Large Language Models (MLLMs), we defined clear input-output mappings for efficient video creation. To bolster the model's capability in processing and comprehending video content, we introduce a strategic combination of a denser frame rate and a slow-fast processing technique, significantly enhancing the extraction and understanding of both temporal and spatial video information. Furthermore, we introduce a text-to-edit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Artificial Intelligence in Games · Video Analysis and Summarization