Semantically Rich Local Dataset Generation for Explainable AI in Genomics
Pedro Barbosa, Rosina Savisaar, Alcides Fonseca

TL;DR
This paper introduces a genetic programming method to generate semantically rich local datasets for explainable AI in genomics, improving interpretability of deep models by creating diverse, biologically relevant sequence perturbations.
Contribution
The authors develop a domain-guided genetic programming approach to generate local datasets that maintain syntactic similarity while introducing semantic variability, enhancing model interpretability in genomics.
Findings
Achieves rapid diversity in generated datasets
Outperforms random baseline in exploring sequence space
Scales effectively to larger genomic sequences
Abstract
Black box deep learning models trained on genomic sequences excel at predicting the outcomes of different gene regulatory mechanisms. Therefore, interpreting these models may provide novel insights into the underlying biology, supporting downstream biomedical applications. Due to their complexity, interpretable surrogate models can only be built for local explanations (e.g., a single instance). However, accomplishing this requires generating a dataset in the neighborhood of the input, which must maintain syntactic similarity to the original data while introducing semantic variability in the model's predictions. This task is challenging due to the complex sequence-to-function relationship of DNA. We propose using Genetic Programming to generate datasets by evolving perturbations in sequences that contribute to their semantic diversity. Our custom, domain-guided individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
