MM-AU:Towards Multimodal Understanding of Advertisement Videos
Digbalay Bose, Rajat Hebbar, Tiantian Feng, Krishna Somandepalli,, Anfeng Xu, Shrikanth Narayanan

TL;DR
This paper introduces MM-AU, a comprehensive multimodal benchmark for understanding advertisement videos across topics, tone transitions, and social messages, demonstrating the effectiveness of multimodal transformers over unimodal methods.
Contribution
The paper presents MM-AU, a large multilingual multimodal benchmark for ad understanding, and shows that multimodal transformer models outperform unimodal baselines in key tasks.
Findings
Multimodal models outperform unimodal approaches in ad understanding.
Zero-shot reasoning with large language models provides baseline insights.
MM-AU includes over 8.4K videos for comprehensive evaluation.
Abstract
Advertisement videos (ads) play an integral part in the domain of Internet e-commerce as they amplify the reach of particular products to a broad audience or can serve as a medium to raise awareness about specific issues through concise narrative structures. The narrative structures of advertisements involve several elements like reasoning about the broad content (topic and the underlying message) and examining fine-grained details involving the transition of perceived tone due to the specific sequence of events and interaction among characters. In this work, to facilitate the understanding of advertisements along the three important dimensions of topic categorization, perceived tone transition, and social message detection, we introduce a multimodal multilingual benchmark called MM-AU composed of over 8.4K videos (147 hours) curated from multiple web sources. We explore multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Topic Modeling · Text and Document Classification Technologies
