MMR-Bench: A Comprehensive Benchmark for Multimodal LLM Routing
Haoxuan Ma, Guannan Lai, Han-Jia Ye

TL;DR
MMR-Bench introduces a comprehensive benchmark for evaluating and improving model routing strategies in multimodal large language models, optimizing the trade-off between accuracy and computational cost across diverse tasks.
Contribution
It provides a standardized, modality-aware benchmarking environment for routing in MLLMs, enabling fair comparison and development of cost-effective, accurate model selection policies.
Findings
Incorporating multimodal signals enhances routing quality.
Routing policies can surpass single-model accuracy at lower costs.
Policies trained on limited data generalize well to new datasets.
Abstract
Multimodal large language models (MLLMs) have advanced rapidly, yet heterogeneity in architecture, alignment strategies, and efficiency means that no single model is uniformly superior across tasks. In practical deployments, workloads span lightweight OCR to complex multimodal reasoning; using one MLLM for all queries either over-provisions compute on easy instances or sacrifices accuracy on hard ones. Query-level model selection (routing) addresses this tension, but extending routing from text-only LLMs to MLLMs is nontrivial due to modality fusion, wide variation in computational cost across models, and the absence of a standardized, budget-aware evaluation. We present MMR-Bench, a unified benchmark that isolates the multimodal routing problem and enables comparison under fixed candidate sets and cost models. MMR-Bench provides (i) a controlled environment with modality-aware inputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
