On Explaining Visual Captioning with Hybrid Markov Logic Networks
Monika Shah, Somdeb Sarkhel, Deepak Venugopal

TL;DR
This paper introduces a novel, interpretable explanation framework for image captioning models using Hybrid Markov Logic Networks, which identify influential training examples to clarify how captions are generated.
Contribution
It develops a new hybrid logic-based explanation method that combines symbolic rules with real-valued functions to interpret deep neural captioning models.
Findings
The framework effectively identifies training examples influencing caption generation.
Experiments demonstrate the interpretability of explanations for multiple captioning models.
The approach enables comparison of models based on explainability.
Abstract
Deep Neural Networks (DNNs) have made tremendous progress in multimodal tasks such as image captioning. However, explaining/interpreting how these models integrate visual information, language information and knowledge representation to generate meaningful captions remains a challenging problem. Standard metrics to measure performance typically rely on comparing generated captions with human-written ones that may not provide a user with a deep insights into this integration. In this work, we develop a novel explanation framework that is easily interpretable based on Hybrid Markov Logic Networks (HMLNs) - a language that can combine symbolic rules with real-valued functions - where we hypothesize how relevant examples from the training data could have influenced the generation of the observed caption. To do this, we learn a HMLN distribution over the training instances and infer the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
