VL-RouterBench: A Benchmark for Vision-Language Model Routing
Zhehao Huang, Baijiong Lin, Jingyuan Zhang, Jingying Wang, Yuhang Liu, Ning Lu, Tao Li, Xiaolin Huang

TL;DR
VL-RouterBench is a comprehensive benchmark designed to evaluate vision-language model routing systems, measuring accuracy, cost, and throughput across diverse datasets and models to facilitate progress in multimodal routing research.
Contribution
The paper introduces VL-RouterBench, the first systematic, reproducible benchmark for evaluating vision-language model routing, including a large-scale dataset, evaluation protocol, and open-source tools.
Findings
Significant routability improvements observed with new routing methods.
Current routers still lag behind the ideal Oracle, indicating room for advancement.
Benchmark covers extensive datasets, models, and evaluation metrics.
Abstract
Multi-model routing has evolved from an engineering technique into essential infrastructure, yet existing work lacks a systematic, reproducible benchmark for evaluating vision-language models (VLMs). We present VL-RouterBench to assess the overall capability of VLM routing systems systematically. The benchmark is grounded in raw inference and scoring logs from VLMs and constructs quality and cost matrices over sample-model pairs. In scale, VL-RouterBench covers 14 datasets across 3 task groups, totaling 30,540 samples, and includes 15 open-source models and 2 API models, yielding 519,180 sample-model pairs and a total input-output token volume of 34,494,977. The evaluation protocol jointly measures average accuracy, average cost, and throughput, and builds a ranking score from the harmonic mean of normalized cost and accuracy to enable comparison across router configurations and cost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Multimodal Machine Learning Applications · Advanced Neural Network Applications
