ShoulderShot: Generating Over-the-Shoulder Dialogue Videos

Yuang Zhang; Junqi Cheng; Haoyu Zhao; Jiaxi Gu; Fangyuan Zou; Zenghui Lu; Peng Shu

arXiv:2508.07597·cs.CV·August 18, 2025

ShoulderShot: Generating Over-the-Shoulder Dialogue Videos

Yuang Zhang, Junqi Cheng, Haoyu Zhao, Jiaxi Gu, Fangyuan Zou, Zenghui Lu, Peng Shu

PDF

TL;DR

ShoulderShot is a novel framework for generating over-the-shoulder dialogue videos that maintains character consistency and spatial continuity, enabling extended multi-turn dialogues with improved realism and flexibility.

Contribution

We introduce ShoulderShot, a new method combining dual-shot generation and looping video to produce longer, more consistent dialogue scenes in video generation.

Findings

01

Outperforms existing methods in shot-reverse-shot layout

02

Enhances spatial continuity in generated videos

03

Supports flexible, multi-turn dialogues

Abstract

Over-the-shoulder dialogue videos are essential in films, short dramas, and advertisements, providing visual variety and enhancing viewers' emotional connection. Despite their importance, such dialogue scenes remain largely underexplored in video generation research. The main challenges include maintaining character consistency across different shots, creating a sense of spatial continuity, and generating long, multi-turn dialogues within limited computational budgets. Here, we present ShoulderShot, a framework that combines dual-shot generation with looping video, enabling extended dialogues while preserving character consistency. Our results demonstrate capabilities that surpass existing methods in terms of shot-reverse-shot layout, spatial continuity, and flexibility in dialogue length, thereby opening up new possibilities for practical dialogue video generation. Videos and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.