MLLM-VADStory: Domain Knowledge-Driven Multimodal LLMs for Video Ad Storyline Insights

Jasmine Yang; Poppy Zhang; Shawndra Hill

arXiv:2601.07850·cs.MM·January 14, 2026

MLLM-VADStory: Domain Knowledge-Driven Multimodal LLMs for Video Ad Storyline Insights

Jasmine Yang, Poppy Zhang, Shawndra Hill

PDF

Open Access

TL;DR

MLLM-VADStory introduces a domain knowledge-guided multimodal LLM framework that segments, classifies, and analyzes video ad storylines to enhance understanding and creative strategies across diverse social media ads.

Contribution

The paper presents a novel framework that leverages domain-specific functional role taxonomy to systematically analyze and generate insights for video ad storylines at scale.

Findings

01

Story-based creatives increase video retention.

02

Top-performing story arcs can guide advertising strategies.

03

Framework effectively recovers data-driven storyline structures.

Abstract

We propose MLLM-VADStory, a novel domain knowledge-guided multimodal large language models (MLLM) framework to systematically quantify and generate insights for video ad storyline understanding at scale. The framework is centered on the core idea that ad narratives are structured by functional intent, with each scene unit performing a distinct communicative function, delivering product and brand-oriented information within seconds. MLLM-VADStory segments ads into functional units, classifies each unit's functionality using a novel advertising-specific functional role taxonomy, and then aggregates functional sequences across ads to recover data-driven storyline structures. Applying the framework to 50k social media video ads across four industry subverticals, we find that story-based creatives improve video retention, and we recommend top-performing story arcs to guide advertisers in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games