Sekai: A Video Dataset towards World Exploration

Zhen Li; Chuanhao Li; Xiaofeng Mao; Shaoheng Lin; Ming Li; Shitian Zhao; Zhaopan Xu; Xinyue Li; Yukang Feng; Jianwen Sun; Zizhen Li; Fanrui Zhang; Jiaxin Ai; Zhixiang Wang; Yuwei Wu; Tong He; Jiangmiao Pang; Yu Qiao; Yunde Jia; Kaipeng Zhang

arXiv:2506.15675·cs.CV·November 11, 2025

Sekai: A Video Dataset towards World Exploration

Zhen Li, Chuanhao Li, Xiaofeng Mao, Shaoheng Lin, Ming Li, Shitian Zhao, Zhaopan Xu, Xinyue Li, Yukang Feng, Jianwen Sun, Zizhen Li, Fanrui Zhang, Jiaxin Ai, Zhixiang Wang, Yuwei Wu, Tong He, Jiangmiao Pang, Yu Qiao, Yunde Jia, Kaipeng Zhang

PDF

Open Access 1 Datasets

TL;DR

Sekai is a large, diverse, and richly annotated first-person video dataset from around the world, designed to advance video generation and world exploration research.

Contribution

The paper introduces Sekai, a comprehensive worldwide video dataset with extensive annotations, addressing limitations of existing datasets for world exploration training.

Findings

01

Demonstrates the dataset's scale and diversity.

02

Shows effectiveness in training video generation models.

03

Provides high-quality annotations for various exploration aspects.

Abstract

Video generation techniques have made remarkable progress, promising to be the foundation of interactive world exploration. However, existing video generation datasets are not well-suited for world exploration training as they suffer from some limitations: limited locations, short duration, static scenes, and a lack of annotations about exploration and the world. In this paper, we introduce Sekai (meaning "world" in Japanese), a high-quality first-person view worldwide video dataset with rich annotations for world exploration. It consists of over 5,000 hours of walking or drone view (FPV and UVA) videos from over 100 countries and regions across 750 cities. We develop an efficient and effective toolbox to collect, pre-process and annotate videos with location, scene, weather, crowd density, captions, and camera trajectories. Comprehensive analyses and experiments demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Lixsp11/Sekai-Project
dataset· 1.1k dl
1.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis