Exploring the Dynamic Scheduling Space of Real-Time Generative AI Applications on Emerging Heterogeneous Systems

Rachid Karami; Rajeev Patwari; Hyoukjun Kwon; Ashish Sirasao

arXiv:2507.14715·cs.LG·July 22, 2025

Exploring the Dynamic Scheduling Space of Real-Time Generative AI Applications on Emerging Heterogeneous Systems

Rachid Karami, Rajeev Patwari, Hyoukjun Kwon, Ashish Sirasao

PDF

TL;DR

This paper characterizes real-time generative AI workloads on heterogeneous edge systems, analyzing scheduling impacts on performance and proposing strategies for workload-aware, dynamic scheduling to meet real-time constraints.

Contribution

It provides a comprehensive analysis of RTGen workloads on AMD Ryzen AI, evaluating scheduling policies and highlighting the importance of workload-aware strategies for heterogeneous systems.

Findings

01

Scheduling significantly impacts deadline violations, with up to 41.7% difference.

02

Workload dynamics and hardware heterogeneity require adaptive scheduling.

03

Performance metrics are highly sensitive to scheduling decisions.

Abstract

The integration of generative AI models, particularly large language models (LLMs), into real-time multi-model AI applications such as video conferencing and gaming is giving rise to a new class of workloads: real-time generative AI (RTGen). These workloads combine the compute intensity and dynamic execution patterns of generative models with the stringent latency and concurrency constraints of real-time inference. To meet the diverse demands of RTGen workloads, modern edge platforms increasingly adopt heterogeneous system-on-chip (SoC) architectures that integrate CPUs, GPUs, and NPUs. Despite the potential of heterogeneous SoC, the scheduling space complexity and performance implications of RTGen workloads on such platforms remain underexplored. In this work, we perform a comprehensive characterization of RTGen workloads on AMD's latest heterogeneous SoC, Ryzen AI. We construct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.