Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision-Language Models
Jeongwoo Lee, Baek Duhyeong, Eungyeol Han, Soyeon Shin, Gukin han, Seungduk Kim, Jaehyun Jeon, Taewoo Jeong

TL;DR
This paper introduces a new hospitality-specific VQA dataset and framework to evaluate how well vision-language models provide decision-relevant information for hotel and facility images, revealing current limitations and the need for domain-specific fine-tuning.
Contribution
The work presents a formal informativeness framework, a hospitality-focused VQA dataset, and an analysis of VLMs' decision-oriented capabilities in the hospitality domain.
Findings
VLMs lack intrinsic decision-awareness in hospitality VQA.
Key visual signals are underutilized in current models.
Domain-specific fine-tuning improves informativeness reasoning.
Abstract
Recent advances in Vision-Language Models (VLMs) have demonstrated impressive multimodal understanding in general domains. However, their applicability to decision-oriented domains such as hospitality remains largely unexplored. In this work, we investigate how well VLMs can perform visual question answering (VQA) about hotel and facility images that are central to consumer decision-making. While many existing VQA benchmarks focus on factual correctness, they rarely capture what information users actually find useful. To address this, we first introduce Informativeness as a formal framework to quantify how much hospitality-relevant information an image-question pair provides. Guided by this framework, we construct a new hospitality-specific VQA dataset that covers various facility types, where questions are specifically designed to reflect key user information needs. Using this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Image and Video Retrieval Techniques
