Studying Limits of Explainability by Integrated Gradients for Gene Expression Models
Myriam Bontonou, Ana\"is Haget, Maria Boulougouri, Jean-Michel Arbona,, Benjamin Audit, Pierre Borgnat

TL;DR
This paper investigates the limitations of using Integrated Gradients for identifying biomarkers in gene expression models, highlighting challenges in robustness and interpretability through simulations and real data analysis.
Contribution
It introduces a hierarchical simulation model for gene expression data and discusses best practices for evaluating explainability methods in genomics.
Findings
Ranking features by importance alone is insufficient for robust biomarker identification.
Simulated data reveals limitations of Integrated Gradients in capturing true biomarkers.
Proposes guidelines for evaluating explanations in genomics applications.
Abstract
Understanding the molecular processes that drive cellular life is a fundamental question in biological research. Ambitious programs have gathered a number of molecular datasets on large populations. To decipher the complex cellular interactions, recent work has turned to supervised machine learning methods. The scientific questions are formulated as classical learning problems on tabular data or on graphs, e.g. phenotype prediction from gene expression data. In these works, the input features on which the individual predictions are predominantly based are often interpreted as indicative of the cause of the phenotype, such as cancer identification. Here, we propose to explore the relevance of the biomarkers identified by Integrated Gradients, an explainability method for feature attribution in machine learning. Through a motivating example on The Cancer Genome Atlas, we show that ranking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Gene expression and cancer classification · Explainable Artificial Intelligence (XAI)
