Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores

Yixian Shen; Chaoyao Shen; Jan Deen; George Floros; Andy Pimentel; Anuj Pathania

arXiv:2604.11948·cs.LG·April 15, 2026

Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores

Yixian Shen, Chaoyao Shen, Jan Deen, George Floros, Andy Pimentel, Anuj Pathania

PDF

TL;DR

This paper introduces AILFM, an active imitation learning framework that optimizes thermal-aware scheduling for 3D S-NUCA many-core systems running large foundation models, improving performance and thermal safety.

Contribution

The paper presents a novel AIL-based scheduling method that learns from Oracle demonstrations to manage thermal and performance trade-offs in heterogeneous many-core systems.

Findings

01

AILFM outperforms existing thermal management baselines.

02

It generalizes effectively across diverse LFM workloads.

03

Achieves near-optimal thermal safety and performance balance.

Abstract

Large Foundation Model (LFM) inference is both memory- and compute-intensive, traditionally relying on GPUs. However, the limited availability and high cost have motivated the adoption of high-performance general-purpose CPUs, especially emerging 3D-stacked Static Non-Uniform Cache Architecture (3D S-NUCA) systems. These architectures offer enhanced bandwidth and locality but suffer from severe thermal challenges and uneven cache latencies due to 3D Networks-on-Chip (NoC). Optimal management of thread migration and V/f scaling is non-trivial due to LFM kernel diversity and system heterogeneity. Existing thermal management approaches often rely on oversimplified analytical models and lack adaptability. We propose AILFM, an Active Imitation Learning (AIL)-based scheduling framework that learns near-optimal thermal-aware scheduling policies from Oracle demonstrations with minimal run-time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.