Deadline-Driven Hierarchical Agentic Resource Sharing for AI Services and RAN Functions in AI-RAN
Haiyuan Li, Yulei Wu, Dimitra Simeonidou

TL;DR
This paper introduces a hierarchical agentic framework (HAF) for efficient compute sharing in AI-RAN, combining slow and fast-timescale scheduling with predictive migration filtering, significantly improving service fulfillment and SLO adherence.
Contribution
The novel HAF framework integrates LLM-based slow-timescale placement with a convex algorithm for fast scheduling, enhanced by a predictive critic to minimize service interruptions.
Findings
HAF achieves 90.0% SLO fulfillment, a 20.5% improvement over baselines.
Service request fulfillment increases from 51% to 85.3%.
HAF maintains performance under diverse load conditions.
Abstract
AI-RAN consolidates AI services and Radio Access Network (RAN) functions onto a unified, GPU-accelerated infrastructure at the network edge. However, compute sharing between real-time RAN functions and highly heterogeneous AI services requires coordination of scheduling decisions at mismatched timescales, and placement adaptation may require service migration across nodes with non-negligible interruptions. This paper proposes a hierarchical agentic framework (HAF) for compute sharing in AI-RAN that combines a large language model (LLM)-based agent for slow-timescale placement of AI services and RAN functions with a closed-form, deadline-aware convex algorithm for fast-timescale GPU/CPU allocation. The LLM agent is further equipped with a predictive critic that filters out migrations when the induced service interruption outweighs the expected service-level objective (SLO) benefit.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
