Benchmarking Edge Computing Devices for Grape Bunches and Trunks Detection using Accelerated Object Detection Single Shot MultiBox Deep Learning Models
Sandro Costa Magalh\~aes, Filipe Neves Santos, Pedro Machado and, Ant\'onio Paulo Moreira, Jorge Dias

TL;DR
This paper benchmarks various edge computing devices, including GPUs, TPUs, and FPGAs, for real-time grape bunch and trunk detection using accelerated deep learning models, highlighting speed and power efficiency differences.
Contribution
It provides a comparative analysis of heterogeneous hardware platforms for real-time object detection in agricultural robotics.
Findings
FPGAs achieved the highest inference speeds (14-25 FPS).
GPUs were the slowest, with 3-5 FPS.
TPUs and GPUs are the most power-efficient, around 5W.
Abstract
Purpose: Visual perception enables robots to perceive the environment. Visual data is processed using computer vision algorithms that are usually time-expensive and require powerful devices to process the visual data in real-time, which is unfeasible for open-field robots with limited energy. This work benchmarks the performance of different heterogeneous platforms for object detection in real-time. This research benchmarks three architectures: embedded GPU -- Graphical Processing Units (such as NVIDIA Jetson Nano 2 GB and 4 GB, and NVIDIA Jetson TX2), TPU -- Tensor Processing Unit (such as Coral Dev Board TPU), and DPU -- Deep Learning Processor Unit (such as in AMD-Xilinx ZCU104 Development Board, and AMD-Xilinx Kria KV260 Starter Kit). Method: The authors used the RetinaNet ResNet-50 fine-tuned using the natural VineSet dataset. After the trained model was converted and compiled for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsCorrelation Alignment for Deep Domain Adaptation · 1x1 Convolution · Convolution · Feature Pyramid Network · Focal Loss · RetinaNet
