Human Pose Estimation in Monocular Omnidirectional Top-View Images
Jingrui Yu, Tobias Scheck, Roman Seidel, Yukti Adya, Dipankar Nandi,, Gangolf Hirtz

TL;DR
This paper introduces a new synthetic dataset and evaluation framework for human pose estimation in omnidirectional top-view images, demonstrating improved accuracy over baseline models in indoor monitoring scenarios.
Contribution
The work presents THEODORE+, a large synthetic dataset for training CNNs on omnidirectional images, and evaluates four training paradigms on real-world data, advancing indoor human pose estimation.
Findings
Significant improvement over COCO baseline in top-view scenes
Effective training paradigms for CNNs on omnidirectional data
New synthetic dataset enhances pose estimation accuracy
Abstract
Human pose estimation (HPE) with convolutional neural networks (CNNs) for indoor monitoring is one of the major challenges in computer vision. In contrast to HPE in perspective views, an indoor monitoring system can consist of an omnidirectional camera with a field of view of 180{\deg} to detect the pose of a person with only one sensor per room. To recognize human pose, the detection of keypoints is an essential upstream step. In our work we propose a new dataset for training and evaluation of CNNs for the task of keypoint detection in omnidirectional images. The training dataset, THEODORE+, consists of 50,000 images and is created by a 3D rendering engine, where humans are randomly walking through an indoor environment. In a dynamically created 3D scene, persons move randomly with simultaneously moving omnidirectional camera to generate synthetic RGB images and 2D and 3D ground truth.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Vision and Imaging
MethodsDeep Layer Aggregation · Convolution · Center Pooling · Batch Normalization · Cascade Corner Pooling · CenterNet
