Temporal In-Context Fine-Tuning with Temporal Reasoning for Versatile Control of Video Diffusion Models
Kinam Kim, Junha Hyung, Jaegul Choo

TL;DR
This paper introduces Temporal In-Context Fine-Tuning (TIC-FT), a versatile method for adapting pretrained video diffusion models to various conditional generation tasks with minimal data and no architectural changes.
Contribution
TIC-FT enables efficient fine-tuning of video diffusion models using buffer frames for smooth temporal transitions, requiring only 10-30 samples and no model modifications.
Findings
Outperforms existing methods in condition fidelity and visual quality.
Requires only 10-30 training samples for effective adaptation.
Works across diverse tasks like image-to-video and video-to-video generation.
Abstract
Recent advances in text-to-video diffusion models have enabled high-quality video synthesis, but controllable generation remains challenging, particularly under limited data and compute. Existing fine-tuning methods for conditional generation often rely on external encoders or architectural modifications, which demand large datasets and are typically restricted to spatially aligned conditioning, limiting flexibility and scalability. In this work, we introduce Temporal In-Context Fine-Tuning (TIC-FT), an efficient and versatile approach for adapting pretrained video diffusion models to diverse conditional generation tasks. Our key idea is to concatenate condition and target frames along the temporal axis and insert intermediate buffer frames with progressively increasing noise levels. These buffer frames enable smooth transitions, aligning the fine-tuning process with the pretrained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods · Advanced Vision and Imaging · Advanced Image Processing Techniques
