Semantic Similarity is a Spurious Measure of Comic Understanding: Lessons Learned from Hallucinations in a Benchmarking Experiment

Christopher Driggers-Ellis; Nachiketh Tibrewal; Rohit Bogulla; Harsh Khanna; Sangpil Youm; Christan Grant; Bonnie Dorr

arXiv:2603.01950·cs.LG·March 3, 2026

Semantic Similarity is a Spurious Measure of Comic Understanding: Lessons Learned from Hallucinations in a Benchmarking Experiment

Christopher Driggers-Ellis, Nachiketh Tibrewal, Rohit Bogulla, Harsh Khanna, Sangpil Youm, Christan Grant, Bonnie Dorr

PDF

Open Access

TL;DR

This paper evaluates vision-language models for comic understanding, revealing that semantic similarity metrics are unreliable due to hallucinations, and highlights the need for improved data and hallucination mitigation.

Contribution

It introduces a benchmark for comic interpretation with a focus on hallucinations, providing taxonomy and guidance for future research in this domain.

Findings

01

Hallucinations significantly affect model performance.

02

Semantic similarity is a spurious measure for comic understanding.

03

Guidelines for future research on hallucination mitigation.

Abstract

A system that enables blind or visually impaired users to access comics/manga would introduce a new medium of storytelling to this community. However, no such system currently exists. Generative vision-language models (VLMs) have shown promise in describing images and understanding comics, but most research on comic understanding is limited to panel-level analysis. To fully support blind and visually impaired users, greater attention must be paid to page-level understanding and interpretation. In this work, we present a preliminary benchmark of VLM performance on comic interpretation tasks. We identify and categorize hallucinations that emerge during this process, organizing them into generalized object-hallucination taxonomies. We conclude with guidance on future research, emphasizing hallucination mitigation and improved data curation for comic interpretation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComics and Graphic Narratives · Multimodal Machine Learning Applications · Artificial Intelligence in Games