First Frame Is the Place to Go for Video Content Customization

Jingxi Chen; Zongxia Li; Zhichao Liu; Guangyao Shi; Xiyang Wu; Fuxiao Liu; Cornelia Fermuller; Brandon Y. Feng; Yiannis Aloimonos

arXiv:2511.15700·cs.CV·March 24, 2026

First Frame Is the Place to Go for Video Content Customization

Jingxi Chen, Zongxia Li, Zhichao Liu, Guangyao Shi, Xiyang Wu, Fuxiao Liu, Cornelia Fermuller, Brandon Y. Feng, Yiannis Aloimonos

PDF

Open Access

TL;DR

This paper redefines the role of the first frame in video generation models, showing it acts as a memory buffer that enables effective content customization with minimal training data.

Contribution

It introduces a novel perspective on the first frame's role and demonstrates a method for robust video customization using few examples without changing model architecture.

Findings

01

Effective video customization with 20-50 examples

02

No need for architectural modifications or large-scale finetuning

03

First frame as a conceptual memory buffer

Abstract

What role does the first frame play in video generation models? Traditionally, it's viewed as the spatial-temporal starting point of a video, merely a seed for subsequent animation. In this work, we reveal a fundamentally different perspective: video models implicitly treat the first frame as a conceptual memory buffer that stores visual entities for later reuse during generation. Leveraging this insight, we show that it's possible to achieve robust and generalized video content customization in diverse scenarios, using only 20-50 training examples without architectural changes or large-scale finetuning. This unveils a powerful, overlooked capability of video generation models for reference-based video customization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games