SPAgent: Adaptive Task Decomposition and Model Selection for General   Video Generation and Editing

Rong-Cheng Tu; Wenhao Sun; Zhao Jin; Jingyi Liao; Jiaxing Huang,; Dacheng Tao

arXiv:2411.18983·cs.CV·December 2, 2024

SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing

Rong-Cheng Tu, Wenhao Sun, Zhao Jin, Jingyi Liao, Jiaxing Huang,, Dacheng Tao

PDF

Open Access

TL;DR

SPAgent is a system that automatically coordinates diverse video generation and editing models based on user intent, improving adaptability, efficiency, and quality in video tasks through a novel three-step framework and autonomous model evaluation.

Contribution

The paper introduces SPAgent, a novel system that automates model coordination for video tasks, including a three-step framework and autonomous model evaluation, enhancing versatility and user accessibility.

Findings

01

Effective automatic coordination of models for diverse video tasks.

02

Improved video quality through autonomous model assessment.

03

Versatile performance across multiple video generation and editing tasks.

Abstract

While open-source video generation and editing models have made significant progress, individual models are typically limited to specific tasks, failing to meet the diverse needs of users. Effectively coordinating these models can unlock a wide range of video generation and editing capabilities. However, manual coordination is complex and time-consuming, requiring users to deeply understand task requirements and possess comprehensive knowledge of each model's performance, applicability, and limitations, thereby increasing the barrier to entry. To address these challenges, we propose a novel video generation and editing system powered by our Semantic Planning Agent (SPAgent). SPAgent bridges the gap between diverse user intents and the effective utilization of existing generative models, enhancing the adaptability, efficiency, and overall quality of video generation and editing.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis

MethodsLib