OmniPD: One-Step Person Detection in Top-View Omnidirectional Indoor   Scenes

Jingrui Yu; Roman Seidel; Gangolf Hirtz

arXiv:2204.06846·cs.CV·April 15, 2022

OmniPD: One-Step Person Detection in Top-View Omnidirectional Indoor Scenes

Jingrui Yu, Roman Seidel, Gangolf Hirtz

PDF

TL;DR

This paper introduces a real-time, one-step CNN-based person detection method for top-view omnidirectional indoor scenes that directly predicts bounding boxes without perspective transformation, improving efficiency and accuracy.

Contribution

The paper presents a novel approach using transfer learning with SSD variants fine-tuned on omnidirectional images, achieving high accuracy and real-time performance without complex pre- or post-processing.

Findings

01

Achieved 83.2% AP with moSSD and 86.3% with resSSD on omnidirectional data.

02

Real-time detection at 28-38 ms per image on Nvidia Quadro P6000.

03

Method generalizes to other CNN-based detectors and objects in omnidirectional images.

Abstract

We propose a one-step person detector for topview omnidirectional indoor scenes based on convolutional neural networks (CNNs). While state of the art person detectors reach competitive results on perspective images, missing CNN architectures as well as training data that follows the distortion of omnidirectional images makes current approaches not applicable to our data. The method predicts bounding boxes of multiple persons directly in omnidirectional images without perspective transformation, which reduces overhead of pre- and post-processing and enables real-time performance. The basic idea is to utilize transfer learning to fine-tune CNNs trained on perspective images with data augmentation techniques for detection in omnidirectional images. We fine-tune two variants of Single Shot MultiBox detectors (SSDs). The first one uses Mobilenet v1 FPN as feature extractor (moSSD). The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Feature Pyramid Network · Non Maximum Suppression · 1x1 Convolution · Convolution · SSD