CR^2: Cost-Aware Risk-Controlled Routing for Wireless Device-Edge LLM Inference
Nan Xue, Shengkang Chen, Zhiyong Chen, Jiangchao Yao, Yaping Sun, Zixia Hu, and Meixia Tao

TL;DR
CR^2 is a cost-aware routing framework for mobile edge LLM inference that balances latency, energy, and accuracy by making device-edge deferral decisions with explicit risk control.
Contribution
It introduces a two-stage routing framework with a conformal risk control calibration for explicit marginal risk management in wireless edge LLM deployment.
Findings
CR^2 closely matches full-information routing using only device signals.
CR^2 improves accuracy-cost trade-offs over baseline methods.
CR^2 reduces deployment costs by up to 16.9% at the same accuracy.
Abstract
As large language models (LLMs) move from centralized clouds to mobile edge environments, efficient serving must balance latency, energy consumption, and accuracy under constrained device-edge resources. Query-level routing between lightweight on-device models and stronger edge models provides a flexible mechanism to navigate this trade-off. However, existing routers are designed for centralized cloud settings and optimize token-level costs, failing to capture the dynamic latency and energy overheads in wireless edge deployments. In this paper, we formulate mobile edge LLM routing as a deployment-constrained, cost-aware decision problem, and propose CR^2, a two-stage device-edge routing framework. CR^2 decouples a lightweight on-device margin gate from an edge-side utility selector for deferred queries. The margin gate operates on frozen query embeddings and a user-specified cost weight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
