Event-Enriched Image Analysis Grand Challenge at ACM Multimedia 2025
Thien-Phuc Tran, Minh-Quang Nguyen, Minh-Triet Tran, Tam V. Nguyen, Trong-Le Do, Duy-Nam Ly, Viet-Tham Huynh, Khanh-Duy Le, Mai-Khiem Tran, Trung-Nghia Le

TL;DR
The EVENTA Grand Challenge at ACM Multimedia 2025 introduces a large-scale benchmark for event-level multimodal understanding, emphasizing contextual and semantic analysis of images to capture comprehensive event information.
Contribution
It presents the first large-scale benchmark dataset and challenge focused on event-enriched image understanding, integrating contextual, temporal, and semantic information.
Findings
45 teams participated from six countries.
Top teams demonstrated advanced multimodal understanding.
Benchmark sets a foundation for future context-aware multimedia AI.
Abstract
The Event-Enriched Image Analysis (EVENTA) Grand Challenge, hosted at ACM Multimedia 2025, introduces the first large-scale benchmark for event-level multimodal understanding. Traditional captioning and retrieval tasks largely focus on surface-level recognition of people, objects, and scenes, often overlooking the contextual and semantic dimensions that define real-world events. EVENTA addresses this gap by integrating contextual, temporal, and semantic information to capture the who, when, where, what, and why behind an image. Built upon the OpenEvents V1 dataset, the challenge features two tracks: Event-Enriched Image Retrieval and Captioning, and Event-Based Image Retrieval. A total of 45 teams from six countries participated, with evaluation conducted through Public and Private Test phases to ensure fairness and reproducibility. The top three teams were invited to present their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
