BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation
Zihao Zhu, Ruotong Wang, Siwei Lyu, Min Zhang, Baoyuan Wu

TL;DR
BrandFusion is a multi-agent framework that enables seamless and natural integration of brands into text-to-video content, improving recognizability and semantic fidelity for commercial applications.
Contribution
It introduces the first framework for automatic brand embedding in T2V videos, combining offline knowledge base construction with online prompt refinement.
Findings
Outperforms baselines in semantic preservation and brand recognizability
Demonstrates effectiveness across 20 brands and multiple T2V models
Enhances user satisfaction and supports T2V monetization
Abstract
The rapid advancement of text-to-video (T2V) models has revolutionized content creation, yet their commercial potential remains largely untapped. We introduce, for the first time, the task of seamless brand integration in T2V: automatically embedding advertiser brands into prompt-generated videos while preserving semantic fidelity to user intent. This task confronts three core challenges: maintaining prompt fidelity, ensuring brand recognizability, and achieving contextually natural integration. To address them, we propose BrandFusion, a novel multi-agent framework comprising two synergistic phases. In the offline phase (advertiser-facing), we construct a Brand Knowledge Base by probing model priors and adapting to novel brands via lightweight fine-tuning. In the online phase (user-facing), five agents jointly refine user prompts through iterative refinement, leveraging the shared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Innovative Human-Technology Interaction
