PersonaVlog: Personalized Multimodal Vlog Generation with Multi-Agent Collaboration and Iterative Self-Correction

Xiaolu Hou; Bing Ma; Jiaxiang Cheng; Xuhua Ren; Kai Yu; Wenyue Li; Tianxiang Zheng; Qinglin Lu

arXiv:2508.13602·cs.CV·September 3, 2025

PersonaVlog: Personalized Multimodal Vlog Generation with Multi-Agent Collaboration and Iterative Self-Correction

Xiaolu Hou, Bing Ma, Jiaxiang Cheng, Xuhua Ren, Kai Yu, Wenyue Li, Tianxiang Zheng, Qinglin Lu

PDF

TL;DR

PersonaVlog introduces a novel multimodal Vlog generation framework that leverages multi-agent collaboration and iterative self-correction to produce personalized, high-quality short videos with minimal predefined scripting.

Contribution

It presents a multi-agent collaboration framework based on Multimodal Large Language Models and a feedback mechanism for iterative self-correction, advancing automated personalized Vlog creation.

Findings

01

Outperforms several baselines in quality and personalization.

02

Effective in generating diverse multimodal content.

03

Provides a standardized benchmarking framework for evaluation.

Abstract

With the growing demand for short videos and personalized content, automated Video Log (Vlog) generation has become a key direction in multimodal content creation. Existing methods mostly rely on predefined scripts, lacking dynamism and personal expression. Therefore, there is an urgent need for an automated Vlog generation approach that enables effective multimodal collaboration and high personalization. To this end, we propose PersonaVlog, an automated multimodal stylized Vlog generation framework that can produce personalized Vlogs featuring videos, background music, and inner monologue speech based on a given theme and reference image. Specifically, we propose a multi-agent collaboration framework based on Multimodal Large Language Models (MLLMs). This framework efficiently generates high-quality prompts for multimodal content creation based on user input, thereby improving the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.