Addressing Explainability of Generative AI using SMILE (Statistical Model-agnostic Interpretability with Local Explanations)
Zeinab Dehghani

TL;DR
This paper introduces gSMILE, a novel explainability framework for generative AI models that provides detailed, human-aligned attributions of model outputs, enhancing transparency and trust in high-stakes applications.
Contribution
The paper extends the SMILE interpretability method to generative models, employing perturbation-based techniques and evaluation metrics for systematic, fine-grained explanations.
Findings
gSMILE produces robust, human-aligned attributions
It generalises effectively across multiple generative architectures
Provides systematic assessment of model behaviour in diverse conditions
Abstract
The rapid advancement of generative artificial intelligence has enabled models capable of producing complex textual and visual outputs; however, their decision-making processes remain largely opaque, limiting trust and accountability in high-stakes applications. This thesis introduces gSMILE, a unified framework for the explainability of generative models, extending the Statistical Model-agnostic Interpretability with Local Explanations (SMILE) method to generative settings. gSMILE employs controlled perturbations of textual input, Wasserstein distance metrics, and weighted surrogate modelling to quantify and visualise how specific components of a prompt or instruction influence model outputs. Applied to Large Language Models (LLMs), gSMILE provides fine-grained token-level attribution and generates intuitive heatmaps that highlight influential tokens and reasoning pathways. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
