Adding Another Dimension to Image-based Animal Detection

Vandita Shukla; Fabio Remondino; Benjamin Risse

arXiv:2604.09210·cs.CV·April 13, 2026

Adding Another Dimension to Image-based Animal Detection

Vandita Shukla, Fabio Remondino, Benjamin Risse

PDF

TL;DR

This paper introduces a pipeline that estimates 3D animal bounding boxes from 2D images using SMAL models, aiding the development of monocular 3D animal detection algorithms.

Contribution

It presents a novel method for generating 3D labels from 2D images without requiring 3D input streams, facilitating future research in monocular 3D animal detection.

Findings

01

Accurately estimates 3D bounding boxes across multiple species.

02

Provides robust labels for benchmarking 3D animal detection algorithms.

03

Demonstrates effective performance on the Animal3D dataset.

Abstract

Monocular imaging of animals inherently reduces 3D structures to 2D projections. Detection algorithms lead to 2D bounding boxes that lack information about animal's orientation relative to the camera. To build 3D detection methods for RGB animal images, there is a lack of labeled datasets; such labeling processes require 3D input streams along with RGB data. We present a pipeline that utilises Skinned Multi Animal Linear models to estimate 3D bounding boxes and to project them as robust labels into 2D image space using a dedicated camera pose refinement algorithm. To assess which sides of the animal are captured, cuboid face visibility metrics are computed. These 3D bounding boxes and metrics form a crucial step toward developing and benchmarking future monocular 3D animal detection algorithms. We evaluate our method on the Animal3D dataset, demonstrating accurate performance across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.