From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations

Yuzhi Li; Haojun Xu; Feng Tian

arXiv:2505.12237·cs.CV·May 20, 2025

From Shots to Stories: LLM-Assisted Video Editing with Unified Language Representations

Yuzhi Li, Haojun Xu, Feng Tian

PDF

Open Access

TL;DR

This paper explores the use of Large Language Models in video editing by introducing a structured language representation called L-Storyboard and a reasoning strategy named StoryFlow, improving task accuracy and coherence.

Contribution

It introduces L-Storyboard as a novel intermediate representation and proposes the StoryFlow strategy to enhance the stability and logical consistency of LLM-based video editing tasks.

Findings

01

L-Storyboard improves mapping between visual info and language descriptions.

02

StoryFlow enhances logical consistency and stability in shot sequence ordering.

03

Experimental results show significant improvements in interpretability and coherence.

Abstract

Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated remarkable reasoning and generalization capabilities in video understanding; however, their application in video editing remains largely underexplored. This paper presents the first systematic study of LLMs in the context of video editing. To bridge the gap between visual information and language-based reasoning, we introduce L-Storyboard, an intermediate representation that transforms discrete video shots into structured language descriptions suitable for LLM processing. We categorize video editing tasks into Convergent Tasks and Divergent Tasks, focusing on three core tasks: Shot Attributes Classification, Next Shot Selection, and Shot Sequence Ordering. To address the inherent instability of divergent task outputs, we propose the StoryFlow strategy, which converts the divergent multi-path reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization