TV-Dialogue: Crafting Theme-Aware Video Dialogues with Immersive Interaction
Sai Wang, Fan Ma, Xinyi Li, Hehe Fan, Yu Wu

TL;DR
This paper introduces TV-Dialogue, a multi-modal framework for generating theme-aware, immersive video dialogues that align with video content and user-specified themes, enabling real-time character interactions without training.
Contribution
The paper presents a novel multi-modal agent framework for theme-aware video dialogue generation, along with a new evaluation benchmark and zero-shot capabilities.
Findings
Effective dialogue generation for videos of any length and theme
Outperforms existing LLMs in content alignment and visual consistency
Enables applications like video re-creation and film dubbing
Abstract
Recent advancements in LLMs have accelerated the development of dialogue generation across text and images, yet video-based dialogue generation remains underexplored and presents unique challenges. In this paper, we introduce Theme-aware Video Dialogue Crafting (TVDC), a novel task aimed at generating new dialogues that align with video content and adhere to user-specified themes. We propose TV-Dialogue, a novel multi-modal agent framework that ensures both theme alignment (i.e., the dialogue revolves around the theme) and visual consistency (i.e., the dialogue matches the emotions and behaviors of characters in the video) by enabling real-time immersive interactions among video characters, thereby accurately understanding the video content and generating new dialogue that aligns with the given themes. To assess the generated dialogues, we present a multi-granularity evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimedia Communication and Technology · Speech and dialogue systems
MethodsALIGN
