HyperCon: Image-To-Video Model Transfer for Video-To-Video Translation Tasks
Ryan Szeto, Mostafa El-Khamy, Jungwon Lee, Jason J. Corso

TL;DR
HyperCon is a novel method that converts pre-trained image models into temporally consistent video models for video-to-video translation tasks, avoiding extensive video training.
Contribution
It introduces HyperCon, a technique for transforming image models into video models without fine-tuning, supporting a wider range of tasks with temporal consistency.
Findings
Outperforms prior methods in video style transfer and inpainting
Supports masked and unmasked inputs for diverse tasks
Achieves state-of-the-art results without additional training
Abstract
Video-to-video translation is more difficult than image-to-image translation due to the temporal consistency problem that, if unaddressed, leads to distracting flickering effects. Although video models designed from scratch produce temporally consistent results, training them to match the vast visual knowledge captured by image models requires an intractable number of videos. To combine the benefits of image and video models, we propose an image-to-video model transfer method called Hyperconsistency (HyperCon) that transforms any well-trained image model into a temporally consistent video model without fine-tuning. HyperCon works by translating a temporally interpolated video frame-wise and then aggregating over temporally localized windows on the interpolated video. It handles both masked and unmasked inputs, enabling support for even more video-to-video translation tasks than prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
