TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation

Min-Jung Kim; Dongjin Kim; Seokju Yun; Jaegul Choo

arXiv:2506.07205·cs.CV·June 10, 2025

TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation

Min-Jung Kim, Dongjin Kim, Seokju Yun, Jaegul Choo

PDF

Open Access

TL;DR

TV-LiVE introduces a training-free, text-guided video editing method that exploits layer vitality in diffusion models to enable complex edits like object addition and non-rigid transformations.

Contribution

The paper identifies vital layers in diffusion models related to Rotary Position Embeddings and leverages them for effective, training-free video editing guided by text prompts.

Findings

01

Outperforms existing methods in object addition and non-rigid editing

02

Effectively identifies mask regions for new objects using prominent layers

03

Enables complex video edits without additional training or fine-tuning

Abstract

Video editing has garnered increasing attention alongside the rapid progress of diffusion-based video generation models. As part of these advancements, there is a growing demand for more accessible and controllable forms of video editing, such as prompt-based editing. Previous studies have primarily focused on tasks such as style transfer, background replacement, object substitution, and attribute modification, while maintaining the content structure of the source video. However, more complex tasks, including the addition of novel objects and nonrigid transformations, remain relatively unexplored. In this paper, we present TV-LiVE, a Training-free and text-guided Video editing framework via Layerinformed Vitality Exploitation. We empirically identify vital layers within the video generation model that significantly influence the quality of generated outputs. Notably, these layers are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Multimodal Machine Learning Applications