TL;DR
This paper introduces a novel method for estimating object positions from omnidirectional images using repeated region extraction and machine learning, overcoming constraints of traditional pointing gestures in robot navigation.
Contribution
It presents a new approach combining region extraction, projection, and likelihood training to improve object estimation accuracy from distorted omnidirectional images.
Findings
High estimation accuracy achieved despite image distortion
Repeated region extraction enhances robustness
Likelihood training improves object localization precision
Abstract
One of the intuitive instruction methods in robot navigation is a pointing gesture. In this study, we propose a method using an omnidirectional camera to eliminate the user/object position constraint and the left/right constraint of the pointing arm. Although the accuracy of skeleton and object detection is low due to the high distortion of equirectangular images, the proposed method enables highly accurate estimation by repeatedly extracting regions of interest from the equirectangular image and projecting them onto perspective images. Furthermore, we found that training the likelihood of the target object in machine learning further improves the estimation accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
