FastRM: An efficient and automatic explainability framework for multimodal generative models
Gabriela Ben-Melech Stan, Estelle Aflalo, Man Luo, Shachar Rosenman,, Tiep Le, Sayak Paul, Shao-Yen Tseng, Vasudev Lal

TL;DR
FastRM is a novel, efficient framework that significantly reduces computation time and memory usage for generating explainability maps in large vision-language models, enhancing real-time trustworthiness.
Contribution
The paper introduces FastRM, a new method that provides fast, scalable relevancy maps and confidence assessments for LVLMs, improving explainability and reliability in practical applications.
Findings
Achieves 99.8% reduction in computation time
Reduces memory footprint by 44.4%
Enables real-time explainability for LVLMs
Abstract
Large Vision Language Models (LVLMs) have demonstrated remarkable reasoning capabilities over textual and visual inputs. However, these models remain prone to generating misinformation. Identifying and mitigating ungrounded responses is crucial for developing trustworthy AI. Traditional explainability methods such as gradient-based relevancy maps, offer insight into the decision process of models, but are often computationally expensive and unsuitable for real-time output validation. In this work, we introduce FastRM, an efficient method for predicting explainable Relevancy Maps of LVLMs. Furthermore, FastRM provides both quantitative and qualitative assessment of model confidence. Experimental results demonstrate that FastRM achieves a 99.8% reduction in computation time and a 44.4% reduction in memory footprint compared to traditional relevancy map generation. FastRM allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
