Latency and Cost of Multi-Agent Intelligent Tutoring at Scale
Iizalaarab Elhaimeur, Nikos Chrisochoides

TL;DR
This study evaluates the latency and cost of multi-agent LLM tutoring systems at scale, demonstrating how different deployment tiers affect response times and expenses across various concurrency levels.
Contribution
It provides empirical insights and practical guidance on tier selection for deploying multi-agent tutoring systems efficiently and cost-effectively at different scales.
Findings
Priority PayGo maintains sub-4-second responses up to 50 users.
Standard PayGo response times degrade under high concurrency.
Provisioned Throughput offers lowest latency at low concurrency but saturates above 20 users.
Abstract
Multi-agent LLM tutoring systems improve response quality through agent specialization, but each student query triggers several concurrent API calls whose latencies compound through a parallel-phase maximum effect that single-agent systems do not face. We instrument ITAS, a four-agent tutoring system built on Gemini 2.5 Flash and Google Vertex AI, across three throughput tiers (Standard PayGo, Priority PayGo, and Provisioned Throughput) and eleven concurrency levels up to 50 simultaneous users, producing over 3,000 requests drawn from a live graduate STEM deployment. Priority PayGo maintains flat sub-4-second response times across the full load range; Standard PayGo degrades substantially under classroom-scale concurrency; and Provisioned Throughput delivers the lowest latency at low concurrency but saturates its reserved capacity above approximately 20 concurrent users. Cost analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
