SneakPeek: Future-Guided Instructional Streaming Video Generation

Cheeun Hong; German Barquero; Fadime Sener; Markos Georgopoulos; Edgar Sch\"onfeld; Stefan Popov; Yuming Du; Oscar Ma\~nas; Albert Pumarola

arXiv:2512.13019·cs.CV·December 16, 2025

SneakPeek: Future-Guided Instructional Streaming Video Generation

Cheeun Hong, German Barquero, Fadime Sener, Markos Georgopoulos, Edgar Sch\"onfeld, Stefan Popov, Yuming Du, Oscar Ma\~nas, Albert Pumarola

PDF

Open Access

TL;DR

SneakPeek is a diffusion-based autoregressive framework that generates coherent, controllable instructional videos from text prompts by predicting future frames and maintaining temporal consistency across multiple steps.

Contribution

The paper introduces a novel pipeline with predictive causal adaptation, future-guided self-forcing, and multi-prompt conditioning for improved instructional video generation.

Findings

01

Produces temporally coherent instructional videos

02

Maintains semantic fidelity to multi-step instructions

03

Enables dynamic prompt updates during generation

Abstract

Instructional video generation is an emerging task that aims to synthesize coherent demonstrations of procedural activities from textual descriptions. Such capability has broad implications for content creation, education, and human-AI interaction, yet existing video diffusion models struggle to maintain temporal consistency and controllability across long sequences of multiple action steps. We introduce a pipeline for future-driven streaming instructional video generation, dubbed SneakPeek, a diffusion-based autoregressive framework designed to generate precise, stepwise instructional videos conditioned on an initial image and structured textual prompts. Our approach introduces three key innovations to enhance consistency and controllability: (1) predictive causal adaptation, where a causal model learns to perform next-frame prediction and anticipate future keyframes; (2) future-guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Video Analysis and Summarization