Interpretable Multimodal Out-of-context Detection with Soft Logic Regularization
Huanhuan Ma, Jinghao Zhang, Qiang Liu, Shu Wu, Liang Wang

TL;DR
This paper introduces LOGRAN, a logic regularization method for detecting out-of-context misinformation in images and captions, providing interpretable phrase-level explanations and demonstrating competitive results on the NewsCLIPpings dataset.
Contribution
The paper presents a novel logic regularization approach that decomposes out-of-context detection at the phrase level, enhancing interpretability and explanation of the results.
Findings
Competitive performance on NewsCLIPpings dataset
Faithful phrase-level out-of-context predictions
Enhanced interpretability with logical explanations
Abstract
The rapid spread of information through mobile devices and media has led to the widespread of false or deceptive news, causing significant concerns in society. Among different types of misinformation, image repurposing, also known as out-of-context misinformation, remains highly prevalent and effective. However, current approaches for detecting out-of-context misinformation often lack interpretability and offer limited explanations. In this study, we propose a logic regularization approach for out-of-context detection called LOGRAN (LOGic Regularization for out-of-context ANalysis). The primary objective of LOGRAN is to decompose the out-of-context detection at the phrase level. By employing latent variables for phrase-level predictions, the final prediction of the image-caption pair can be aggregated using logical rules. The latent variables also provide an explanation for how the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
