Waymo-3DSkelMo: A Multi-Agent 3D Skeletal Motion Dataset for Pedestrian Interaction Modeling in Autonomous Driving

Guangxun Zhu; Shiyu Fan; Hang Dai; Edmond S. L. Ho

arXiv:2508.09404·cs.CV·August 14, 2025

Waymo-3DSkelMo: A Multi-Agent 3D Skeletal Motion Dataset for Pedestrian Interaction Modeling in Autonomous Driving

Guangxun Zhu, Shiyu Fan, Hang Dai, Edmond S. L. Ho

PDF

TL;DR

Waymo-3DSkelMo is a large-scale, high-quality 3D skeletal motion dataset derived from LiDAR data, enabling better pedestrian interaction modeling for autonomous driving in urban environments.

Contribution

It introduces the first large-scale dataset with temporally coherent 3D skeletal motions and explicit interaction semantics, derived from real-world LiDAR data, improving over prior monocular-based datasets.

Findings

01

Established 3D pose forecasting benchmarks in urban scenarios

02

Demonstrated the dataset's value for fine-grained human behavior understanding

03

Provided over 14,000 seconds of annotated multi-agent interactions

Abstract

Large-scale high-quality 3D motion datasets with multi-person interactions are crucial for data-driven models in autonomous driving to achieve fine-grained pedestrian interaction understanding in dynamic urban environments. However, existing datasets mostly rely on estimating 3D poses from monocular RGB video frames, which suffer from occlusion and lack of temporal continuity, thus resulting in unrealistic and low-quality human motion. In this paper, we introduce Waymo-3DSkelMo, the first large-scale dataset providing high-quality, temporally coherent 3D skeletal motions with explicit interaction semantics, derived from the Waymo Perception dataset. Our key insight is to utilize 3D human body shape and motion priors to enhance the quality of the 3D pose sequences extracted from the raw LiDRA point clouds. The dataset covers over 14,000 seconds across more than 800 real driving…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.