A Survey on Interpretable Cross-modal Reasoning
Dizhan Xue, Shengsheng Qian, Zuyi Zhou, Changsheng Xu

TL;DR
This survey reviews the current state of interpretable cross-modal reasoning, highlighting methods, datasets, challenges, and future directions to enhance transparency and understanding in AI systems across different modalities.
Contribution
It provides a comprehensive taxonomy of I-CMR methods, reviews datasets with explanations, and discusses challenges and future research opportunities.
Findings
Three-level taxonomy of I-CMR methods
Compilation of datasets with explanation annotations
Identification of key challenges and future directions
Abstract
In recent years, cross-modal reasoning (CMR), the process of understanding and reasoning across different modalities, has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics. As the deployment of AI systems becomes more ubiquitous, the demand for transparency and comprehensibility in these systems' decision-making processes has intensified. This survey delves into the realm of interpretable cross-modal reasoning (I-CMR), where the objective is not only to achieve high predictive performance but also to provide human-understandable explanations for the results. This survey presents a comprehensive overview of the typical methods with a three-level taxonomy for I-CMR. Furthermore, this survey reviews the existing CMR datasets with annotations for explanations. Finally, this survey summarizes the challenges for I-CMR and discusses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Text and Document Classification Technologies · Image Retrieval and Classification Techniques
