Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision-Language Models

Jeongwoo Lee; Baek Duhyeong; Eungyeol Han; Soyeon Shin; Gukin han; Seungduk Kim; Jaehyun Jeon; Taewoo Jeong

arXiv:2603.07868·cs.AI·March 10, 2026

Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision-Language Models

Jeongwoo Lee, Baek Duhyeong, Eungyeol Han, Soyeon Shin, Gukin han, Seungduk Kim, Jaehyun Jeon, Taewoo Jeong

PDF

Open Access 1 Video

TL;DR

This paper introduces a new hospitality-specific VQA dataset and framework to evaluate how well vision-language models provide decision-relevant information for hotel and facility images, revealing current limitations and the need for domain-specific fine-tuning.

Contribution

The work presents a formal informativeness framework, a hospitality-focused VQA dataset, and an analysis of VLMs' decision-oriented capabilities in the hospitality domain.

Findings

01

VLMs lack intrinsic decision-awareness in hospitality VQA.

02

Key visual signals are underutilized in current models.

03

Domain-specific fine-tuning improves informativeness reasoning.

Abstract

Recent advances in Vision-Language Models (VLMs) have demonstrated impressive multimodal understanding in general domains. However, their applicability to decision-oriented domains such as hospitality remains largely unexplored. In this work, we investigate how well VLMs can perform visual question answering (VQA) about hotel and facility images that are central to consumer decision-making. While many existing VQA benchmarks focus on factual correctness, they rarely capture what information users actually find useful. To address this, we first introduce Informativeness as a formal framework to quantify how much hospitality-relevant information an image-question pair provides. Guided by this framework, we construct a new hospitality-specific VQA dataset that covers various facility types, where questions are specifically designed to reflect key user information needs. Using this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hospitality-VQA: Decision-Oriented Informativeness Evaluation for Vision–Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Image and Video Retrieval Techniques