E-VAds: An E-commerce Short Videos Understanding Benchmark for MLLMs
Xianjie Liu, Yiman Hu, Liang Wu, Ping Hu, Yixiong Zou, Jian Xu, Bo Zheng

TL;DR
This paper introduces E-VAds, a new benchmark for understanding e-commerce short videos, addressing the complexity of multi-modal signals and commercial intent reasoning, and proposes a novel RL-based model that significantly improves performance.
Contribution
It presents the first dedicated benchmark for e-commerce video understanding and develops a specialized RL-based reasoning model with a multi-grained reward system.
Findings
E-VAds exhibits higher information density than general datasets.
E-VAds-R1 achieves over 109% performance gain in commercial intent reasoning.
The benchmark covers diverse product categories and reasoning tasks.
Abstract
E-commerce short videos represent a high-revenue segment of the online video industry characterized by a goal-driven format and dense multi-modal signals. Current models often struggle with these videos because existing benchmarks focus primarily on general-purpose tasks and neglect the reasoning of commercial intent. In this work, we first propose a multi-modal information density assessment framework to quantify the complexity of this domain. Our evaluation reveals that e-commerce content exhibits substantially higher density across visual, audio, and textual modalities compared to mainstream datasets, establishing a more challenging frontier for video understanding. To address this gap, we introduce E-commerce Video Ads Benchmark (E-VAds), which is the first benchmark specifically designed for e-commerce short video understanding. We curated 3,961 high-quality videos from Taobao…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Sentiment Analysis and Opinion Mining · Recommender Systems and Techniques
