LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?

Xueqi Cheng; Yushun Dong

arXiv:2605.11301·cs.AI·May 13, 2026

LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?

Xueqi Cheng, Yushun Dong

PDF

1 Repo

TL;DR

LatentRouter is a novel routing method for multimodal large language models that predicts model utility based on input features, enabling dynamic selection tailored to multimodal task requirements.

Contribution

It introduces a counterfactual utility prediction approach with latent communication for improved model routing in multimodal tasks.

Findings

01

LatentRouter outperforms fixed-model and baseline routers on benchmark datasets.

02

Gains are most significant on tasks requiring visual, layout-sensitive, or reasoning skills.

03

Latent communication between model states is key to the improved performance.

Abstract

Multimodal large language models (MLLMs) have heterogeneous strengths across OCR, chart understanding, spatial reasoning, visual question answering, cost, and latency. Effective MLLM routing therefore requires more than estimating query difficulty: a router must match the multimodal requirements of the current image-question input with the capabilities of each candidate model. We propose LatentRouter, a router that formulates MLLM routing as counterfactual multimodal utility prediction. Given an image-question query, LatentRouter extracts learned multimodal routing capsules, represents each candidate MLLM with a model capability token, and performs latent communication between these states to estimate how each model would perform if selected. A distributional outcome head predicts model-specific counterfactual quality, while a bounded capsule correction refines close decisions without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LabRAI/LatentRouter
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.