# Detecting Interspecific Positive Selection Using Convolutional Neural Networks

**Authors:** Charlotte West, Conor R Walker, Shayesteh Arasti, Viacheslav Vasilev, Xingze Xu, Nicola De Maio, Nick Goldman

PMC · DOI: 10.1093/molbev/msaf154 · 2025-06-30

## TL;DR

This paper introduces a new method using convolutional neural networks to detect positive selection in interspecific data, offering better accuracy and scalability compared to traditional statistical approaches.

## Contribution

The novel use of convolutional neural networks for detecting positive selection improves accuracy and handles noisy data better than traditional methods.

## Key findings

- Convolutional neural networks outperform traditional methods in detecting positive selection, especially with noisy data.
- The model provides faster inference and scalability for large-scale multigene analyses.
- Saliency maps help interpret the model's decisions and enable site-specific selection inference.

## Abstract

Traditional statistical methods using maximum likelihood and Bayesian inference can detect positive selection from an interspecific phylogeny and a codon sequence alignment based on model assumptions, but they are prone to false positives due to alignment errors and can lack power. These problems are particularly pronounced when faced with high levels of indels and divergence. To address these issues, we trained and tested convolutional neural network models on simulated data and achieved higher accuracy in detecting selection across a specific range of phylogenetic scenarios and evolutionary modes. This advantage is particularly evident when performing inference on noisy data prone to misalignments. Our method shows some ability to account for these errors, where most statistical frameworks fail to do so in a tractable manner. We explore the generalizability of our convolutional neural network models to unseen evolutionary scenarios and identify future avenues to achieve broader utility. Once trained, our convolutional neural network model is faster at test time, making it a scalable alternative to traditional statistical methods for large-scale, multigene analyses. In addition to binary classification (inference of the presence or absence of positive selection during the evolution of the sequences), we use saliency maps to understand what the model learns and observe how this could be leveraged for sitewise inference of positive selection.

## Full-text entities

- **Diseases:** MSA (MESH:D010855), cancer (MESH:D009369)
- **Chemicals:** Amino acid (MESH:D000596), CPU (-), Nucleotide (MESH:D009711)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12287699/full.md

---
Source: https://tomesphere.com/paper/PMC12287699