Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration

ChaeHun Park; Yujin Baek; Jaeseok Kim; Yu-Jung Heo; Du-Seong Chang; Jaegul Choo

arXiv:2406.16469·cs.CL·June 2, 2025

Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration

ChaeHun Park, Yujin Baek, Jaeseok Kim, Yu-Jung Heo, Du-Seong Chang, Jaegul Choo

PDF

Open Access 1 Datasets

TL;DR

This paper introduces K-Viscuit, a culturally inclusive benchmark for vision-language models focusing on Korean culture, created through a semi-automated human-AI collaboration process to evaluate and improve model understanding of cultural nuances.

Contribution

We propose a semi-automated framework for building cultural VLM benchmarks and demonstrate its effectiveness with the K-Viscuit dataset, highlighting gaps in current models' cultural understanding.

Findings

01

Open-source models underperform proprietary models on Korean cultural questions.

02

The framework effectively reduces manual effort in dataset creation.

03

External knowledge augmentation improves VLM performance.

Abstract

To create culturally inclusive vision-language models (VLMs), developing a benchmark that tests their ability to address culturally relevant questions is essential. Existing approaches typically rely on human annotators, making the process labor-intensive and creating a cognitive burden in generating diverse questions. To address this, we propose a semi-automated framework for constructing cultural VLM benchmarks, specifically targeting multiple-choice QA. This framework combines human-VLM collaboration, where VLMs generate questions based on guidelines, a small set of annotated examples, and relevant knowledge, followed by a verification process by native speakers. We demonstrate the effectiveness of this framework through the creation of \texttt{K-Viscuit}, a dataset focused on Korean culture. Our experiments on this dataset reveal that open-source models lag behind proprietary ones…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ddehun/k-viscuit
dataset· 718 dl
718 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Surveying and Cultural Heritage · Optical Wireless Communication Technologies · Color Science and Applications

MethodsSparse Evolutionary Training