Inference Energy and Latency in AI-Mediated Education: A Learning-per-Watt Analysis of Edge and Cloud Models
Kushal Khemani

TL;DR
This paper evaluates the energy and latency trade-offs of on-device AI inference configurations for educational tutoring, introducing a new metric to quantify pedagogical value per energy unit, and highlights hardware-dependent efficiency implications.
Contribution
It introduces Learning-per-Watt (LpW), a novel metric for assessing pedagogical value relative to energy consumption, and empirically compares quantization methods on edge and cloud AI models in education.
Findings
NF4 quantization reduces energy per inference but increases latency.
FP16 inference has higher energy consumption but lower latency, leading to better LpW in realistic settings.
Quantization efficiency varies with hardware and inference regime, affecting deployment in low-resource environments.
Abstract
Immediate feedback is a foundational requirement of effective AI-mediated learning, yet the energy and latency costs of delivering it remain largely unexamined. This study investigates the latency-energy-learning trade-off in AI tutoring through an empirical comparison of two on-device inference configurations of Microsoft Phi-3 Mini (4k-instruct) on an NVIDIA T4 GPU: full-precision FP16 and 4-bit NormalFloat (NF4) quantisation. Both were evaluated under KV-cache-enabled inference across 500 educational prompts spanning five secondary school subject domains. Pedagogical quality was assessed for each of the 1000 generated responses by a hybrid panel of 10 Cambridge International teachers and three frontier AI systems using a four-dimension rubric. We introduce Learning-per-Watt (LpW), a novel metric quantifying pedagogical value per unit of energy over the learner's waiting window. Under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Teaching and Learning Programming · Online Learning and Analytics
