OmniPD: One-Step Person Detection in Top-View Omnidirectional Indoor Scenes
Jingrui Yu, Roman Seidel, Gangolf Hirtz

TL;DR
This paper introduces a real-time, one-step CNN-based person detection method for top-view omnidirectional indoor scenes that directly predicts bounding boxes without perspective transformation, improving efficiency and accuracy.
Contribution
The paper presents a novel approach using transfer learning with SSD variants fine-tuned on omnidirectional images, achieving high accuracy and real-time performance without complex pre- or post-processing.
Findings
Achieved 83.2% AP with moSSD and 86.3% with resSSD on omnidirectional data.
Real-time detection at 28-38 ms per image on Nvidia Quadro P6000.
Method generalizes to other CNN-based detectors and objects in omnidirectional images.
Abstract
We propose a one-step person detector for topview omnidirectional indoor scenes based on convolutional neural networks (CNNs). While state of the art person detectors reach competitive results on perspective images, missing CNN architectures as well as training data that follows the distortion of omnidirectional images makes current approaches not applicable to our data. The method predicts bounding boxes of multiple persons directly in omnidirectional images without perspective transformation, which reduces overhead of pre- and post-processing and enables real-time performance. The basic idea is to utilize transfer learning to fine-tune CNNs trained on perspective images with data augmentation techniques for detection in omnidirectional images. We fine-tune two variants of Single Shot MultiBox detectors (SSDs). The first one uses Mobilenet v1 FPN as feature extractor (moSSD). The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Feature Pyramid Network · Non Maximum Suppression · 1x1 Convolution · Convolution · SSD
