Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration
ChaeHun Park, Yujin Baek, Jaeseok Kim, Yu-Jung Heo, Du-Seong Chang, Jaegul Choo

TL;DR
This paper introduces K-Viscuit, a culturally inclusive benchmark for vision-language models focusing on Korean culture, created through a semi-automated human-AI collaboration process to evaluate and improve model understanding of cultural nuances.
Contribution
We propose a semi-automated framework for building cultural VLM benchmarks and demonstrate its effectiveness with the K-Viscuit dataset, highlighting gaps in current models' cultural understanding.
Findings
Open-source models underperform proprietary models on Korean cultural questions.
The framework effectively reduces manual effort in dataset creation.
External knowledge augmentation improves VLM performance.
Abstract
To create culturally inclusive vision-language models (VLMs), developing a benchmark that tests their ability to address culturally relevant questions is essential. Existing approaches typically rely on human annotators, making the process labor-intensive and creating a cognitive burden in generating diverse questions. To address this, we propose a semi-automated framework for constructing cultural VLM benchmarks, specifically targeting multiple-choice QA. This framework combines human-VLM collaboration, where VLMs generate questions based on guidelines, a small set of annotated examples, and relevant knowledge, followed by a verification process by native speakers. We demonstrate the effectiveness of this framework through the creation of \texttt{K-Viscuit}, a dataset focused on Korean culture. Our experiments on this dataset reveal that open-source models lag behind proprietary ones…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage · Optical Wireless Communication Technologies · Color Science and Applications
MethodsSparse Evolutionary Training
