Web-Scale Collection of Video Data for 4D Animal Reconstruction
Brian Nlong Zhao, Jiajun Wu, Shangzhe Wu

TL;DR
This paper introduces a large-scale automated pipeline for mining and processing YouTube videos to create a comprehensive animal video dataset, enabling advancements in 4D animal reconstruction and related computer vision tasks.
Contribution
It presents a novel automated pipeline for large-scale animal video collection, a new benchmark dataset for 4D animal reconstruction, and a baseline method that improves current state-of-the-art approaches.
Findings
State-of-the-art models perform better with 2D metrics despite unrealistic shapes.
Model-free methods produce more natural reconstructions but score lower.
Sequence-level optimization enhances 4D animal reconstruction results.
Abstract
Computer vision for animals holds great promise for wildlife research but often depends on large-scale data, while existing collection methods rely on controlled capture setups. Recent data-driven approaches show the potential of single-view, non-invasive analysis, yet current animal video datasets are limited--offering as few as 2.4K 15-frame clips and lacking key processing for animal-centric 3D/4D tasks. We introduce an automated pipeline that mines YouTube videos and processes them into object-centric clips, along with auxiliary annotations valuable for downstream tasks like pose estimation, tracking, and 3D/4D reconstruction. Using this pipeline, we amass 30K videos (2M frames)--an order of magnitude more than prior works. To demonstrate its utility, we focus on the 4D quadruped animal reconstruction task. To support this task, we present Animal-in-Motion (AiM), a benchmark of 230…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWildlife Ecology and Conservation · Human Pose and Action Recognition · Advanced Neural Network Applications
