Human Pose Estimation in Monocular Omnidirectional Top-View Images

Jingrui Yu; Tobias Scheck; Roman Seidel; Yukti Adya; Dipankar Nandi,; Gangolf Hirtz

arXiv:2304.08186·cs.CV·April 18, 2023·1 cites

Human Pose Estimation in Monocular Omnidirectional Top-View Images

Jingrui Yu, Tobias Scheck, Roman Seidel, Yukti Adya, Dipankar Nandi,, Gangolf Hirtz

PDF

Open Access

TL;DR

This paper introduces a new synthetic dataset and evaluation framework for human pose estimation in omnidirectional top-view images, demonstrating improved accuracy over baseline models in indoor monitoring scenarios.

Contribution

The work presents THEODORE+, a large synthetic dataset for training CNNs on omnidirectional images, and evaluates four training paradigms on real-world data, advancing indoor human pose estimation.

Findings

01

Significant improvement over COCO baseline in top-view scenes

02

Effective training paradigms for CNNs on omnidirectional data

03

New synthetic dataset enhances pose estimation accuracy

Abstract

Human pose estimation (HPE) with convolutional neural networks (CNNs) for indoor monitoring is one of the major challenges in computer vision. In contrast to HPE in perspective views, an indoor monitoring system can consist of an omnidirectional camera with a field of view of 180{\deg} to detect the pose of a person with only one sensor per room. To recognize human pose, the detection of keypoints is an essential upstream step. In our work we propose a new dataset for training and evaluation of CNNs for the task of keypoint detection in omnidirectional images. The training dataset, THEODORE+, consists of 50,000 images and is created by a 3D rendering engine, where humans are randomly walking through an indoor environment. In a dynamically created 3D scene, persons move randomly with simultaneously moving omnidirectional camera to generate synthetic RGB images and 2D and 3D ground truth.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Vision and Imaging

MethodsDeep Layer Aggregation · Convolution · Center Pooling · Batch Normalization · Cascade Corner Pooling · CenterNet