Web-Scale Collection of Video Data for 4D Animal Reconstruction

Brian Nlong Zhao; Jiajun Wu; Shangzhe Wu

arXiv:2511.01169·cs.CV·November 4, 2025

Web-Scale Collection of Video Data for 4D Animal Reconstruction

Brian Nlong Zhao, Jiajun Wu, Shangzhe Wu

PDF

Open Access

TL;DR

This paper introduces a large-scale automated pipeline for mining and processing YouTube videos to create a comprehensive animal video dataset, enabling advancements in 4D animal reconstruction and related computer vision tasks.

Contribution

It presents a novel automated pipeline for large-scale animal video collection, a new benchmark dataset for 4D animal reconstruction, and a baseline method that improves current state-of-the-art approaches.

Findings

01

State-of-the-art models perform better with 2D metrics despite unrealistic shapes.

02

Model-free methods produce more natural reconstructions but score lower.

03

Sequence-level optimization enhances 4D animal reconstruction results.

Abstract

Computer vision for animals holds great promise for wildlife research but often depends on large-scale data, while existing collection methods rely on controlled capture setups. Recent data-driven approaches show the potential of single-view, non-invasive analysis, yet current animal video datasets are limited--offering as few as 2.4K 15-frame clips and lacking key processing for animal-centric 3D/4D tasks. We introduce an automated pipeline that mines YouTube videos and processes them into object-centric clips, along with auxiliary annotations valuable for downstream tasks like pose estimation, tracking, and 3D/4D reconstruction. Using this pipeline, we amass 30K videos (2M frames)--an order of magnitude more than prior works. To demonstrate its utility, we focus on the 4D quadruped animal reconstruction task. To support this task, we present Animal-in-Motion (AiM), a benchmark of 230…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWildlife Ecology and Conservation · Human Pose and Action Recognition · Advanced Neural Network Applications