3D Human Pose Estimation in Multi-View Operating Room Videos Using   Differentiable Camera Projections

Beerend G.A. Gerats; Jelmer M. Wolterink; Ivo A.M.J. Broeders

arXiv:2210.11826·cs.CV·August 31, 2023

3D Human Pose Estimation in Multi-View Operating Room Videos Using Differentiable Camera Projections

Beerend G.A. Gerats, Jelmer M. Wolterink, Ivo A.M.J. Broeders

PDF

Open Access

TL;DR

This paper introduces an end-to-end trainable method for 3D human pose estimation in multi-view operating room videos, directly optimizing 3D localization through differentiable camera projections, improving accuracy over traditional 2D-to-3D methods.

Contribution

It presents a novel end-to-end training approach that uses a 3D loss with differentiable camera projections, enhancing 3D pose accuracy in challenging surgical environments.

Findings

01

Outperforms traditional 2D detection-based methods in 3D pose accuracy

02

Demonstrates effectiveness on MVOR dataset videos

03

Shows improved localization in occluded and cluttered OR scenes

Abstract

3D human pose estimation in multi-view operating room (OR) videos is a relevant asset for person tracking and action recognition. However, the surgical environment makes it challenging to find poses due to sterile clothing, frequent occlusions, and limited public data. Methods specifically designed for the OR are generally based on the fusion of detected poses in multiple camera views. Typically, a 2D pose estimator such as a convolutional neural network (CNN) detects joint locations. Then, the detected joint locations are projected to 3D and fused over all camera views. However, accurate detection in 2D does not guarantee accurate localisation in 3D space. In this work, we propose to directly optimise for localisation in 3D by training 2D CNNs end-to-end based on a 3D loss that is backpropagated through each camera's projection parameters. Using videos from the MVOR dataset, we show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Surgical Simulation and Training