Towards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models

Mayank Saini; Arit Kumar Bishwas

arXiv:2511.06441·cs.CL·November 11, 2025

Towards Resource-Efficient Multimodal Intelligence: Learned Routing among Specialized Expert Models

Mayank Saini, Arit Kumar Bishwas

PDF

Open Access

TL;DR

This paper presents a modular, learned routing framework that efficiently directs queries to specialized models, reducing reliance on costly large models while maintaining high performance across multimodal tasks.

Contribution

Introduces a unified, learned routing system that dynamically allocates queries to specialized models, significantly reducing costs without sacrificing accuracy.

Findings

01

Achieves over 67% reduction in model reliance while matching or exceeding monolithic model performance.

02

Demonstrates effectiveness on benchmarks like MMLU and VQA.

03

Utilizes a two-stage vision pipeline optimized for efficiency.

Abstract

As AI moves beyond text, large language models (LLMs) increasingly power vision, audio, and document understanding; however, their high inference costs hinder real-time, scalable deployment. Conversely, smaller open-source models offer cost advantages but struggle with complex or multimodal queries. We introduce a unified, modular framework that intelligently routes each query - textual, multimodal, or complex - to the most fitting expert model, using a learned routing network that balances cost and quality. For vision tasks, we employ a two-stage open-source pipeline optimized for efficiency and reviving efficient classical vision components where they remain SOTA for sub-tasks. On benchmarks such as Massive Multitask Language Understanding (MMLU) and Visual Question Answering (VQA), we match or exceed the performance of always-premium LLM (monolithic systems with one model serving all…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning