CARES: Context-Aware Resolution Selector for VLMs

Moshe Kimhi; Nimrod Shabtay; Raja Giryes; Chaim Baskin; Eli Schwartz

arXiv:2510.19496·cs.CV·March 23, 2026

CARES: Context-Aware Resolution Selector for VLMs

Moshe Kimhi, Nimrod Shabtay, Raja Giryes, Chaim Baskin, Eli Schwartz

PDF

Open Access 2 Models 1 Datasets

TL;DR

CARES is a lightweight module that intelligently selects the minimal image resolution needed for large vision-language models, significantly reducing computation without sacrificing accuracy across various tasks.

Contribution

Introduces CARES, a novel context-aware resolution selector that predicts the minimal sufficient resolution for VLMs, enabling substantial compute savings.

Findings

01

Reduces compute by up to 80% across benchmarks.

02

Maintains task performance with lower resolution inputs.

03

Works effectively across diverse multimodal tasks and models.

Abstract

Large vision-language models (VLMs) commonly process images at native or high resolution to remain effective across tasks. This inflates visual tokens ofter to 97-99% of total tokens, resulting in high compute and latency, even when low-resolution images would suffice. We introduce \emph{CARES}-a \textbf{C}ontext-\textbf{A}ware \textbf{R}esolution \textbf{S}elector, a lightweight preprocessing module that, given an image-query pair, predicts the \emph{minimal} sufficient input resolution. CARES uses a compact VLM (350M) to extract features and predict when a target pretrained VLM's response converges to its peak ability to answer correctly. Though trained as a discrete classifier over a set of optional resolutions, CARES interpolates continuous resolutions at inference for fine-grained control. Across five multimodal benchmarks spanning documents and natural images, as well as diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Kimhi/hardness_data_mix
dataset· 30 dl
30 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning