ECVL-ROUTER: Scenario-Aware Routing for Vision-Language Models

Xin Tang; Youfang Han; Fangfei Gou; Wei Zhao; Xin Meng; Yang Yu; Jinguo Zhang; Yuanchun Shi; Yuntao Wang; Tengxiang Zhang

arXiv:2510.27256·cs.LG·November 3, 2025

ECVL-ROUTER: Scenario-Aware Routing for Vision-Language Models

Xin Tang, Youfang Han, Fangfei Gou, Wei Zhao, Xin Meng, Yang Yu, Jinguo Zhang, Yuanchun Shi, Yuntao Wang, Tengxiang Zhang

PDF

Open Access

TL;DR

ECVL-ROUTER is a novel scenario-aware routing framework for vision-language models that dynamically assigns queries to large or small models based on user needs, improving efficiency and maintaining high response quality.

Contribution

This paper introduces the first scenario-aware routing strategy for VLMs, including new evaluation metrics and a multimodal dataset for training and validation.

Findings

01

Routes over 80% of queries to small models

02

Achieves less than 10% drop in problem-solving accuracy

03

Demonstrates improved efficiency in VLM deployment

Abstract

Vision-Language Models (VLMs) excel in diverse multimodal tasks. However, user requirements vary across scenarios, which can be categorized into fast response, high-quality output, and low energy consumption. Relying solely on large models deployed in the cloud for all queries often leads to high latency and energy cost, while small models deployed on edge devices are capable of handling simpler tasks with low latency and energy cost. To fully leverage the strengths of both large and small models, we propose ECVL-ROUTER, the first scenario-aware routing framework for VLMs. Our approach introduces a new routing strategy and evaluation metrics that dynamically select the appropriate model for each query based on user requirements, maximizing overall utility. We also construct a multimodal response-quality dataset tailored for router training and validate the approach through extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques