CLHOP: Combined Audio-Video Learning for Horse 3D Pose and Shape   Estimation

Ci Li; Elin Hernlund; Hedvig Kjellstr\"om; Silvia Zuffi

arXiv:2407.01244·cs.CV·July 2, 2024

CLHOP: Combined Audio-Video Learning for Horse 3D Pose and Shape Estimation

Ci Li, Elin Hernlund, Hedvig Kjellstr\"om, Silvia Zuffi

PDF

Open Access

TL;DR

This paper introduces CLHOP, a novel method that combines audio and visual data to improve 3D horse pose and shape estimation from monocular videos, demonstrating enhanced accuracy and robustness.

Contribution

It is the first study to explore the use of audio in 3D animal motion recovery, introducing a new dataset and showing improved results over visual-only methods.

Findings

01

Audio-visual integration improves 3D pose accuracy.

02

New outdoor horse movement dataset introduced.

03

Enhanced robustness in motion estimation.

Abstract

In the monocular setting, predicting 3D pose and shape of animals typically relies solely on visual information, which is highly under-constrained. In this work, we explore using audio to enhance 3D shape and motion recovery of horses from monocular video. We test our approach on two datasets: an indoor treadmill dataset for 3D evaluation and an outdoor dataset capturing diverse horse movements, the latter being a contribution to this study. Our results show that incorporating sound with visual data leads to more accurate and robust motion regression. This study is the first to investigate audio's role in 3D animal motion recovery.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Music and Audio Processing