# D-LIM: A neural network for interpretable gene–gene interactions

**Authors:** Shuhui Wang, Alexandre Allauzen, Philippe Nghe, Vaitea Opuu

PMC · DOI: 10.1371/journal.pcbi.1014107 · PLOS Computational Biology · 2026-03-23

## TL;DR

D-LIM is a neural network that interprets gene interactions by linking mutations to fitness outcomes in a biologically meaningful way.

## Contribution

D-LIM introduces a neural network with interpretable parameters that model gene-specific phenotypes and their interactions to predict fitness.

## Key findings

- D-LIM achieves state-of-the-art predictive accuracy on mutation–fitness data from metabolic pathways and yeast adaptation.
- The model reveals whether epistatic interactions can be captured by low-dimensional continuous models.
- D-LIM estimates mutational effects on effective phenotypes, enabling extrapolation beyond training data.

## Abstract

Recent advances in gene editing can produce large genotype–fitness maps for targeted genes, yet predicting the effects of mutations between genes remains challenging. Indeed, biochemical models require knowledge of underlying parameters and interactions, whereas machine learning methods typically lack interpretability, as they do not link model parameters to biological quantities. We introduce D-LIM, a neural network that infers low-dimensional fitness landscapes directly from mutation–fitness data. The distinctive feature of D-LIM is that it assumes genes act through independent gene-specific molecular phenotypes whose nonlinear interactions determine fitness. When this assumption holds, the model yields accurate predictions and interpretable effective phenotypes. Conversely, failure reveals that a low-dimensional model is insufficient. Applied to deep mutational scanning of metabolic pathways, protein–protein interactions, and yeast environmental adaptation, D-LIM achieves state-of-the-art predictive accuracy. The inferred phenotype–fitness landscapes reveal whether epistatic interactions can be captured by a low-dimensional continuous model and identify potential trade-offs. Moreover, D-LIM estimates mutational effects on the effective phenotypes, enabling weak extrapolation beyond the training domain. D-LIM demonstrates how simple structure constraints in a neural network can help inference and hypothesis generation in biology.

Understanding how organisms respond to genetic variation is essential for elucidating evolutionary principles. Advances in high-throughput sequencing now allow fitness measurements across thousands of genetic variants at once. These massive datasets are used to build models that explain mutational effects on fitness and predict outcomes of novel variants. A central goal of modeling is to extract biochemical insights and generate new hypotheses about the genotype-to-fitness relationship. However, modeling genotype-to-fitness relationships remains challenging due to the nonlinear, high-dimensional, and context-dependent nature of genetic effects on fitness. Ideally, one would use detailed biological knowledge to formulate hypotheses. However, they are often unavailable. In such a case, machine learning models may predict outcomes, but their opacity limits hypothesis generation. Here, we propose a neural network-based model that bridges this gap, based on a specific hypothesis: genes contribute independently to phenotypes, which are then combined through a function that determines fitness. Unlike conventional models, fitting data under this architecture directly evaluates the hypothesis. Moreover, the constrained architecture of the model yields interpretable phenotype predictions, enabling insights into genetic trade-offs and the global shape of the genotype-to-fitness map. This opens the possibility of uncovering modularity, redundancy, or epistasis patterns that shape fitness landscapes.

## Full-text entities

- **Genes:** TetR [NCBI Gene 7324557], IRA2 (Ras GTPase activating protein IRA2) [NCBI Gene 854073] {aka CCS1, GLC4}, IRA1 (GTPase-activating protein IRA1) [NCBI Gene 852437] {aka GLC1, PPD1}
- **Diseases:** genetic diseases (MESH:D030342), infectious diseases (MESH:D003141), toxicity (MESH:D064420), D-LIM (MESH:D000085343), ML (MESH:D007859), D (MESH:D014808)
- **Chemicals:** NaCL (MESH:D012965), glucose (MESH:D005947), L-Arabinose (MESH:D001089), fluconazole (MESH:D015725), araA (MESH:D014740), salt (MESH:D012492), geldanamycin (MESH:C001277), amino acids (MESH:D000596), KCl (MESH:D011189), AraA (-)
- **Species:** Escherichia coli (E. coli, species) [taxon 562], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13029791/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13029791/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC13029791/full.md

---
Source: https://tomesphere.com/paper/PMC13029791