QEIL v2: Heterogeneous Computing for Edge Intelligence via Roofline-Derived Pareto-Optimal Energy Modeling and Multi-Objective Orchestration
Satyam Kumar, Saurabh Jha

TL;DR
QEI L v2 introduces physics-grounded, runtime-adaptive models and multi-objective optimization to efficiently deploy large language models on heterogeneous edge devices, significantly improving energy efficiency and inference quality.
Contribution
It replaces static heuristics with physics-based, adaptive models and develops a novel multi-objective optimization framework for edge inference.
Findings
Achieves 2.86x energy efficiency improvement over standard inference.
Surpasses empirical IPW=1.0 mark with physics-grounded workload adaptation.
Reduces total energy by 75.6% with latency improvements and fault recovery.
Abstract
Deploying large language models (LLMs) on heterogeneous edge devices demands frameworks that jointly optimize energy efficiency, inference quality, and reliability. Our prior QEIL v1 (Kumar & Jha, 2026) achieved 4.82x IPW improvement but relied on static efficiency factors, greedy optimization, and unverified candidate selection. QEIL v2 replaces every static heuristic with physics-grounded, runtime-adaptive models. We introduce three device-workload metrics: DASI (roofline-derived compute utilization), CPQ (memory pressure from allocation theory), and Phi (thermal yield from CMOS leakage physics), forming a unified energy equation with every coefficient traceable to semiconductor physics. For optimization, PGSAM (Pareto-Guided Simulated Annealing with Momentum) simultaneously minimizes energy, latency, and device underutilization. At inference time, the EAC/ARDE selection cascade with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
