Beyond Generation: Unlocking Universal Editing via Self-Supervised Fine-Tuning
Harold Haodong Chen, Harry Yang, Ser-Nam Lim

TL;DR
This paper introduces UES, a self-supervised fine-tuning method that transforms video generation models into versatile editing systems, reducing training costs and enabling universal editing capabilities across diverse tasks.
Contribution
The paper presents a lightweight self-supervised fine-tuning approach that unifies video generation and editing, significantly reducing training complexity and enabling broad applicability.
Findings
Enables models to perform universal editing without additional supervision.
Reduces tunable parameters by over 92%.
Achieves state-of-the-art editing performance across diverse tasks.
Abstract
Recent advances in video generation have outpaced progress in video editing, which remains constrained by several limiting factors, namely: (a) the task's dependency on supervision severely limits generality, (b) an unnecessary artificial separation between the generation and editing task, and (c) the high computational costs of training a video model. In this work, we propose UES (Unlocking Universal Editing via Self-Supervision), a lightweight self-supervised fine-tuning strategy that transforms generation models into unified generation-editing systems through self-supervised semantic alignment. Our approach establishes a dual-conditioning mechanism where original video-text pairs jointly provide visual and textual semantics, enabling structured learning of intrinsic spatiotemporal correspondences. Key advantages include: (i) Universality through supervision-free adaptation to diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Interactive and Immersive Displays · Innovative Human-Technology Interaction
MethodsSoftmax · Attention Is All You Need
