COVE: COntext and VEracity prediction for out-of-context images
Jonathan Tonglet, Gabriel Thiem, Iryna Gurevych

TL;DR
COVE is a novel method that sequentially predicts the true context of out-of-context images and then assesses caption veracity, significantly improving accuracy on real-world misinformation detection tasks.
Contribution
It introduces a two-step approach combining context prediction and veracity assessment, outperforming existing models especially on real-world data.
Findings
COVE surpasses state-of-the-art context prediction models by over five percentage points.
It performs competitively with top veracity models on synthetic data.
Human studies show the predicted context is useful for verifying captions.
Abstract
Images taken out of their context are the most prevalent form of multimodal misinformation. Debunking them requires (1) providing the true context of the image and (2) checking the veracity of the image's caption. However, existing automated fact-checking methods fail to tackle both objectives explicitly. In this work, we introduce COVE, a new method that predicts first the true COntext of the image and then uses it to predict the VEracity of the caption. COVE beats the SOTA context prediction model on all context items, often by more than five percentage points. It is competitive with the best veracity prediction models on synthetic data and outperforms them on real-world data, showing that it is beneficial to combine the two tasks sequentially. Finally, we conduct a human study that reveals that the predicted context is a reusable and interpretable artifact to verify new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Image and Video Quality Assessment · Generative Adversarial Networks and Image Synthesis
MethodsTanh Activation · Sigmoid Activation · Bidirectional LSTM · Softmax · Long Short-Term Memory · Sequence to Sequence · GloVe Embeddings · Location-based Attention · Contextual Word Vectors
