OpenEvents V1: Large-Scale Benchmark Dataset for Multimodal Event Grounding

Hieu Nguyen; Phuc-Tan Nguyen; Thien-Phuc Tran; Minh-Quang Nguyen; Tam V. Nguyen; Minh-Triet Tran; Trung-Nghia Le

arXiv:2506.18372·cs.CV·August 27, 2025

OpenEvents V1: Large-Scale Benchmark Dataset for Multimodal Event Grounding

Hieu Nguyen, Phuc-Tan Nguyen, Thien-Phuc Tran, Minh-Quang Nguyen, Tam V. Nguyen, Minh-Triet Tran, Trung-Nghia Le

PDF

TL;DR

OpenEvents V1 is a large-scale, multimodal dataset designed to improve event-centric vision-language understanding through three key tasks involving captioning, news article retrieval, and image retrieval, supporting advanced reasoning over real-world events.

Contribution

The paper introduces OpenEvents V1, a comprehensive dataset with over 200,000 news articles and 400,000 images, focusing on contextual and temporal event grounding beyond surface-level descriptions.

Findings

01

Baseline results established for all tasks.

02

Standardized evaluation protocols provided.

03

Dataset enables deep reasoning over complex events.

Abstract

We introduce OpenEvents V1a large-scale benchmark dataset designed to advance event-centric vision-language understanding. Unlike conventional image captioning and retrieval datasets that focus on surface-level descriptions, OpenEvents V1 dataset emphasizes contextual and temporal grounding through three primary tasks: (1) generating rich, event-aware image captions, (2) retrieving event-relevant news articles from image queries, and (3) retrieving event-relevant images from narrative-style textual queries. The dataset comprises over 200,000 news articles and 400,000 associated images sourced from CNN and The Guardian, spanning diverse domains and time periods. We provide extensive baseline results and standardized evaluation protocols for all tasks. OpenEvents V1 establishes a robust foundation for developing multimodal AI systems capable of deep reasoning over complex real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.