An Embedded Monocular Vision Approach for Ground-Aware Objects Detection and Position Estimation
Jo\~ao G. Melo, Edna Barros

TL;DR
This paper presents an embedded monocular vision system using SSD MobileNet v2 on NVIDIA Jetson Nano for real-time detection and position estimation of soccer objects, outperforming existing systems within 1 meter range.
Contribution
It introduces a ground-aware, monocular vision approach optimized for embedded systems, enabling accurate, real-time soccer object detection and localization.
Findings
Achieves 30 fps processing speed.
Overcomes existing SSL vision systems for objects within 1 meter.
Root Mean Square Error of 14.37 mm for ball localization.
Abstract
In the RoboCup Small Size League (SSL), teams are encouraged to propose solutions for executing basic soccer tasks inside the SSL field using only embedded sensing information. Thus, this work proposes an embedded monocular vision approach for detecting objects and estimating relative positions inside the soccer field. Prior knowledge from the environment is exploited by assuming objects lay on the ground, and the onboard camera has its position fixed on the robot. We implemented the proposed method on an NVIDIA Jetson Nano and employed SSD MobileNet v2 for 2D Object Detection with TensorRT optimization, detecting balls, robots, and goals with distances up to 3.5 meters. Ball localization evaluation shows that the proposed solution overcomes the currently used SSL vision system for positions closer than 1 meter to the onboard camera with a Root Mean Square Error of 14.37 millimeters. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · IoT and Edge/Fog Computing
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Convolution · 1x1 Convolution · Non Maximum Suppression · SSD
