Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson -- Extended

Abhinaba Chakraborty; Wouter Tavernier; Akis Kourtis; Mario Pickavet; Andreas Oikonomakis; Didier Colle

arXiv:2508.08430·cs.DC·August 13, 2025

Profiling Concurrent Vision Inference Workloads on NVIDIA Jetson -- Extended

Abhinaba Chakraborty, Wouter Tavernier, Akis Kourtis, Mario Pickavet, Andreas Oikonomakis, Didier Colle

PDF

Open Access

TL;DR

This paper analyzes resource utilization in concurrent vision inference workloads on NVIDIA Jetson devices, revealing underutilized GPU components and CPU bottlenecks, and offers insights for hardware-aware optimizations.

Contribution

It provides a comprehensive profiling methodology and detailed analysis of GPU and CPU resource sharing in edge vision inference workloads, highlighting bottlenecks and optimization opportunities.

Findings

01

GPU utilization can reach 100% with optimizations

02

SMs and tensor cores often operate at 15-30% utilization

03

CPU-side events frequently cause performance bottlenecks

Abstract

The proliferation of IoT devices and advancements in network technologies have intensified the demand for real-time data processing at the network edge. To address these demands, low-power AI accelerators, particularly GPUs, are increasingly deployed for inference tasks, enabling efficient computation while mitigating cloud-based systems' latency and bandwidth limitations. Despite their growing deployment, GPUs remain underutilised even in computationally intensive workloads. This underutilisation stems from the limited understanding of GPU resource sharing, particularly in edge computing scenarios. In this work, we conduct a detailed analysis of both high- and low-level metrics, including GPU utilisation, memory usage, streaming multiprocessor (SM) utilisation, and tensor core usage, to identify bottlenecks and guide hardware-aware optimisations. By integrating traces from multiple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Big Data and Digital Economy