VOILA: Value-of-Information Guided Fidelity Selection for Cost-Aware Multimodal Question Answering

Rahul Atul Bhope; K. R. Jayaram; Vinod Muthusamy; Ritesh Kumar; Vatche Isahagian; Nalini Venkatasubramanian

arXiv:2602.03007·cs.CV·February 4, 2026

VOILA: Value-of-Information Guided Fidelity Selection for Cost-Aware Multimodal Question Answering

Rahul Atul Bhope, K. R. Jayaram, Vinod Muthusamy, Ritesh Kumar, Vatche Isahagian, Nalini Venkatasubramanian

PDF

Open Access

TL;DR

VOILA introduces an adaptive framework for selecting visual input fidelity in multimodal question answering, significantly reducing costs while maintaining high accuracy by predicting the utility of different fidelity levels before retrieval.

Contribution

It proposes a novel value-of-information-based method for dynamic fidelity selection in VQA, improving cost-efficiency across multiple datasets and models.

Findings

01

Achieves 50-60% cost reduction while maintaining 90-95% accuracy.

02

Effective across diverse datasets and vision-language models.

03

Pre-retrieval fidelity selection is crucial for resource-efficient multimodal inference.

Abstract

Despite significant costs from retrieving and processing high-fidelity visual inputs, most multimodal vision-language systems operate at fixed fidelity levels. We introduce VOILA, a framework for Value-Of-Information-driven adaptive fidelity selection in Visual Question Answering (VQA) that optimizes what information to retrieve before model execution. Given a query, VOILA uses a two-stage pipeline: a gradient-boosted regressor estimates correctness likelihood at each fidelity from question features alone, then an isotonic calibrator refines these probabilities for reliable decision-making. The system selects the minimum-cost fidelity maximizing expected utility given predicted accuracy and retrieval costs. We evaluate VOILA across three deployment scenarios using five datasets (VQA-v2, GQA, TextVQA, LoCoMo, FloodNet) and six Vision-Language Models (VLMs) with 7B-235B parameters. VOILA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications