TL;DR
Soap2Soap introduces a multi-agent framework for long cinematic video remaking, ensuring narrative, motion, and identity consistency across hundreds of shots through novel mechanisms and verification.
Contribution
The paper presents a new multi-agent system with dual-bridge consistency and batch keyframe generation for improved long-term video editing fidelity.
Findings
Outperforms commercial APIs in long-term consistency and narrative fidelity.
Enforces semantic and visual consistency over hundreds of shots.
Uses a verification agent to maintain identity and alignment.
Abstract
We study series-level cinematic remaking, a long-horizon video-to-video generation problem that localizes full episodes or films via stylization or actor replacement while strictly preserving narrative structure, motion choreography, and character identity across hundreds of shots. Existing video generation and editing pipelines often break down in this regime due to compounding identity drift, background mutation, and semantic erosion under large camera motions and viewpoint changes. We propose Soap2Soap, a multi-agent framework that enforces long-term language-visual consistency through a Dual-Bridge Consistency mechanism: a scene-aware JSON screenplay serving as a persistent semantic backbone, and dynamically allocated visual reference anchors at both scene and shot levels. To suppress drift before video synthesis, we introduce batch keyframe consistency, jointly generating multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
