Reward-Based Online LLM Routing via NeuralUCB
Ming-Hua Tsai, Phat Tran

TL;DR
This paper proposes NeuralUCB for cost-aware LLM routing, demonstrating improved utility and reduced inference costs in a simulated online setting, with promising potential and some remaining challenges.
Contribution
It introduces a NeuralUCB-based routing policy for LLMs, showing its effectiveness over baselines in utility and cost efficiency.
Findings
NeuralUCB outperforms random and min-cost baselines in utility reward.
The method achieves lower inference costs while maintaining competitive reward.
Remaining challenges include action discrimination and exploration.
Abstract
This study investigates the use of NeuralUCB for cost-aware large language model (LLM) routing. Existing routing approaches can be broadly grouped into supervised routing methods and partial-feedback methods, each with different tradeoffs in efficiency and adaptivity. We implement a NeuralUCB-based routing policy and evaluate it on RouterBench under a simulated online setting. Experimental results show that the proposed method consistently outperforms random and min-cost baselines in utility reward. Compared with the max-quality reference, our method achieves substantially lower inference cost while maintaining competitive reward. These findings suggest that NeuralUCB is a promising approach for cost-aware LLM routing, while also highlighting remaining challenges in action discrimination and exploration.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
