Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation
Tushar Prasanna Swaminathan, Christopher Silver, Thangarajah Akilan

TL;DR
This paper empirically evaluates the performance of optimized deep learning models on NVIDIA Jetson Nano, demonstrating significant speed improvements and highlighting the importance of hardware-aware optimization for resource-constrained AI deployment.
Contribution
It provides a comprehensive analysis of model optimization effects on embedded devices, emphasizing hardware-specific tuning for improved inference speed and energy efficiency.
Findings
Optimized models are 16.11% faster on average.
Hardware-aware optimization enhances deployment efficiency.
Model optimization reduces energy consumption and carbon footprint.
Abstract
The proliferation of complex deep learning (DL) models has revolutionized various applications, including computer vision-based solutions, prompting their integration into real-time systems. However, the resource-intensive nature of these models poses challenges for deployment on low-computational power and low-memory devices, like embedded and edge devices. This work empirically investigates the optimization of such complex DL models to analyze their functionality on an embedded device, particularly on the NVIDIA Jetson Nano. It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection. The experimental results reveal that, on average, optimized models exhibit a 16.11% speed improvement over their non-optimized counterparts. This not only emphasizes the critical need to consider hardware constraints and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsQuantum-Dot Cellular Automata · Molecular Communication and Nanonetworks · Advanced Data and IoT Technologies
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
