VL-RouterBench: A Benchmark for Vision-Language Model Routing

Zhehao Huang; Baijiong Lin; Jingyuan Zhang; Jingying Wang; Yuhang Liu; Ning Lu; Tao Li; Xiaolin Huang

arXiv:2512.23562·cs.LG·March 19, 2026

VL-RouterBench: A Benchmark for Vision-Language Model Routing

Zhehao Huang, Baijiong Lin, Jingyuan Zhang, Jingying Wang, Yuhang Liu, Ning Lu, Tao Li, Xiaolin Huang

PDF

Open Access 2 Datasets

TL;DR

VL-RouterBench is a comprehensive benchmark designed to evaluate vision-language model routing systems, measuring accuracy, cost, and throughput across diverse datasets and models to facilitate progress in multimodal routing research.

Contribution

The paper introduces VL-RouterBench, the first systematic, reproducible benchmark for evaluating vision-language model routing, including a large-scale dataset, evaluation protocol, and open-source tools.

Findings

01

Significant routability improvements observed with new routing methods.

02

Current routers still lag behind the ideal Oracle, indicating room for advancement.

03

Benchmark covers extensive datasets, models, and evaluation metrics.

Abstract

Multi-model routing has evolved from an engineering technique into essential infrastructure, yet existing work lacks a systematic, reproducible benchmark for evaluating vision-language models (VLMs). We present VL-RouterBench to assess the overall capability of VLM routing systems systematically. The benchmark is grounded in raw inference and scoring logs from VLMs and constructs quality and cost matrices over sample-model pairs. In scale, VL-RouterBench covers 14 datasets across 3 task groups, totaling 30,540 samples, and includes 15 open-source models and 2 API models, yielding 519,180 sample-model pairs and a total input-output token volume of 34,494,977. The evaluation protocol jointly measures average accuracy, average cost, and throughput, and builds a ranking score from the harmonic mean of normalized cost and accuracy to enable comparison across router configurations and cost…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware-Defined Networks and 5G · Multimodal Machine Learning Applications · Advanced Neural Network Applications