SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation

YaoYang Liu; Yuechen Zhang; Wenbo Li; Yufei Zhao; Rui Liu; Long Chen

arXiv:2605.06356·cs.CV·May 12, 2026

SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation

YaoYang Liu, Yuechen Zhang, Wenbo Li, Yufei Zhao, Rui Liu, Long Chen

PDF

1 Repo

TL;DR

SwiftI2V is a novel, efficient high-resolution image-to-video generation framework that balances fidelity and computational cost through segment-wise synthesis and input conditioning.

Contribution

It introduces Conditional Segment-wise Generation (CSG) for scalable, high-fidelity 2K I2V synthesis with significantly reduced GPU time.

Findings

01

Achieves comparable performance to end-to-end models at 2K resolution.

02

Reduces GPU time by 202x on VBench-I2V.

03

Enables practical high-resolution I2V on consumer GPUs.

Abstract

High-resolution image-to-video (I2V) generation aims to synthesize realistic temporal dynamics while preserving fine-grained appearance details of the input image. At 2K resolution, it becomes extremely challenging, and existing solutions suffer from various weaknesses: 1) end-to-end models are often prohibitively expensive in memory and latency; 2) cascading low-resolution generation with a generic video super-resolution tends to hallucinate details and drift from input-specific local structures, since the super-resolution stage is not explicitly conditioned on the input image. To this end, we propose SwiftI2V, an efficient framework tailored for high-resolution I2V. Following the widely used two-stage design, it addresses the efficiency--fidelity dilemma by first generating a low-resolution motion reference to reduce token costs and ease the modeling burden, then performing a strongly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hkust-longgroup/SwiftI2V
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.