Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson -- Extended
Abhinaba Chakraborty, Wouter Tavernier, Akis Kourtis, Mario Pickavet, Andreas Oikonomakis, Didier Colle

TL;DR
This paper analyzes resource utilization in concurrent vision inference workloads on NVIDIA Jetson devices, revealing underutilized GPU components and CPU bottlenecks, and offers insights for hardware-aware optimizations.
Contribution
It provides a comprehensive profiling methodology and detailed analysis of GPU and CPU resource sharing in edge vision inference workloads, highlighting bottlenecks and optimization opportunities.
Findings
GPU utilization can reach 100% with optimizations
SMs and tensor cores often operate at 15-30% utilization
CPU-side events frequently cause performance bottlenecks
Abstract
The proliferation of IoT devices and advancements in network technologies have intensified the demand for real-time data processing at the network edge. To address these demands, low-power AI accelerators, particularly GPUs, are increasingly deployed for inference tasks, enabling efficient computation while mitigating cloud-based systems' latency and bandwidth limitations. Despite their growing deployment, GPUs remain underutilised even in computationally intensive workloads. This underutilisation stems from the limited understanding of GPU resource sharing, particularly in edge computing scenarios. In this work, we conduct a detailed analysis of both high- and low-level metrics, including GPU utilisation, memory usage, streaming multiprocessor (SM) utilisation, and tensor core usage, to identify bottlenecks and guide hardware-aware optimisations. By integrating traces from multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Big Data and Digital Economy
