Verifying Cross-modal Entity Consistency in News using Vision-language   Models

Sahar Tahmasebi; David Ernst; Eric M\"uller-Budack; Ralph Ewerth

arXiv:2501.11403·cs.CL·February 3, 2025

Verifying Cross-modal Entity Consistency in News using Vision-language Models

Sahar Tahmasebi, David Ernst, Eric M\"uller-Budack, Ralph Ewerth

PDF

Open Access 1 Repo

TL;DR

This paper introduces LVLM4CEC, a framework using large vision-language models to verify entity consistency across images and text in news, helping to detect disinformation by assessing persons, locations, and events.

Contribution

It proposes novel prompting strategies for LVLMs in news entity verification and extends datasets with manual ground-truth data for this task.

Findings

01

Improved accuracy in verifying persons and events with evidence images.

02

Outperforms baseline methods in location and event verification.

03

Demonstrates potential of LVLMs for automating cross-modal entity verification.

Abstract

The web has become a crucial source of information, but it is also used to spread disinformation, often conveyed through multiple modalities like images and text. The identification of inconsistent cross-modal information, in particular entities such as persons, locations, and events, is critical to detect disinformation. Previous works either identify out-of-context disinformation by assessing the consistency of images to the whole document, neglecting relations of individual entities, or focus on generic entities that are not relevant to news. So far, only few approaches have addressed the task of validating entity consistency between images and text in news. However, the potential of large vision-language models (LVLMs) has not been explored yet. In this paper, we propose an LVLM-based framework for verifying Cross-modal Entity Consistency~(LVLM4CEC), to assess whether persons,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tibhannover/lvlm4cec
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling

MethodsFocus