Shape vs. Context: Examining Human--AI Gaps in Ambiguous Japanese Character Recognition

Daichi Haraguchi

arXiv:2602.23746·cs.HC·March 2, 2026

Shape vs. Context: Examining Human--AI Gaps in Ambiguous Japanese Character Recognition

Daichi Haraguchi

PDF

Open Access

TL;DR

This study compares human and Vision-Language Model decision patterns in ambiguous Japanese character recognition, revealing differences and conditions that improve alignment, thus providing insights for benchmarking human-AI decision alignment.

Contribution

It introduces a method to compare human and VLM decision boundaries using interpolated Japanese characters and explores how context affects their alignment.

Findings

01

Humans and VLMs differ in shape-only recognition boundaries.

02

Context can improve VLM-human alignment in some cases.

03

Behavioral differences highlight the need for better alignment benchmarks.

Abstract

High text recognition performance does not guarantee that Vision-Language Models (VLMs) share human-like decision patterns when resolving ambiguity. We investigate this behavioral gap by directly comparing humans and VLMs using continuously interpolated Japanese character shapes generated via a $β$ -VAE. We estimate decision boundaries in a single-character recognition (shape-only task) and evaluate whether VLM responses align with human judgments under shape in context (i.e., embedding an ambiguous character near the human decision boundary in word-level context). We find that human and VLM decision boundaries differ in the shape-only task, and that shape in context can improve human alignment in some conditions. These results highlight qualitative behavioral differences, offering foundational insights toward human--VLM alignment benchmarking.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Text Readability and Simplification