WildLIFT: Lifting monocular drone video to 3D for species-agnostic wildlife monitoring

Vandita Shukla; Fabio Remondino; Blair Costelloe; Benjamin Risse

arXiv:2604.24718·cs.CV·April 28, 2026

WildLIFT: Lifting monocular drone video to 3D for species-agnostic wildlife monitoring

Vandita Shukla, Fabio Remondino, Blair Costelloe, Benjamin Risse

PDF

TL;DR

WildLIFT is a novel framework that extracts 3D geometric and semantic information from monocular drone videos for wildlife monitoring, enabling species-agnostic detection, tracking, and ecological analysis.

Contribution

It introduces a method to integrate 3D scene geometry with open-vocabulary 2D segmentation, reducing manual annotation and enhancing ecological data analysis from drone footage.

Findings

01

Validated on 2,581 frames with over 6,700 3D detections

02

Maintains high identity consistency in multi-animal scenes

03

Reduces manual annotation effort through keyframe refinement

Abstract

Monocular RGB cameras mounted on drones are widely used for wildlife monitoring, yet most analytical pipelines remain confined to two-dimensional image space, leaving geometric information in video underexploited. We present WildLIFT, a computational framework that integrates three-dimensional scene geometry from monocular drone video with open-vocabulary 2D instance segmentation to enable species-agnostic 3D detection and tracking. Oriented 3D bounding box labels with semantic face information enable quantitative assessment of viewpoint coverage and inter-animal occlusion, producing structured metadata for downstream ecological analyses. We validate the framework on 2,581 manually curated frames comprising over 6,700 3D detections across four large mammal species. WildLIFT maintains high identity consistency in multi-animal scenes and substantially reduces manual 3D annotation effort…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.