Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images

Elisei Rykov; Kseniia Petrushina; Kseniia Titova; Anton Razzhigaev; Alexander Panchenko; Vasily Konovalov

arXiv:2505.07704·cs.CV·May 13, 2025

Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images

Elisei Rykov, Kseniia Petrushina, Kseniia Titova, Anton Razzhigaev, Alexander Panchenko, Vasily Konovalov

PDF

Open Access 1 Video

TL;DR

This paper presents Through the Looking Glass (TLG), a novel method using large vision-language models and transformers to evaluate the common sense consistency of images, achieving state-of-the-art results.

Contribution

Introduction of TLG, a new approach combining LVLMs and transformers for assessing image common sense consistency with minimal fine-tuning.

Findings

01

State-of-the-art performance on WHOOPS! dataset

02

Effective extraction of atomic facts from images

03

Compact fine-tuning component enhances evaluation accuracy

Abstract

Measuring how real images look is a complex task in artificial intelligence research. For example, an image of a boy with a vacuum cleaner in a desert violates common sense. We introduce a novel method, which we call Through the Looking Glass (TLG), to assess image common sense consistency using Large Vision-Language Models (LVLMs) and Transformer-based encoder. By leveraging LVLMs to extract atomic facts from these images, we obtain a mix of accurate facts. We proceed by fine-tuning a compact attention-pooling classifier over encoded atomic facts. Our TLG has achieved a new state-of-the-art performance on the WHOOPS! and WEIRD datasets while leveraging a compact fine-tuning component.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis