# Improving Signal-to-Noise Ratio of Drug Fragment Screening with Variational Autoencoder

**Authors:** Phyllis Zhang, Minhuan Li, Daniel Keedy, Tamar Skaist Mehlman, Doeke Hekstra

PMC · DOI: 10.1063/4.0001064 · 2025-10-27

## TL;DR

This paper introduces VALDO, a machine learning method that improves the detection of drug fragments binding to proteins by reducing noise in crystallographic data.

## Contribution

VALDO uses a variational autoencoder to enhance signal-to-noise ratios in fragment screening, outperforming existing methods.

## Key findings

- VALDO effectively filters out crystal heterogeneity to reveal meaningful ligand binding signals.
- VALDO outperforms PanDDA and Cluster4x in detecting and estimating the pose of bound drug fragments.
- The method reconstructs an apo state, enabling clearer difference maps for ligand identification.

## Abstract

In the quest for new drug candidates, a pivotal phase involves identifying compounds that selectively and robustly bind to their targets to modulate activity for therapeutic effects. This modulation can manifest as inhibition, activation, or allosteric regulation, among others. A core challenge in drug discovery is detecting ligands with high binding affinity to target proteins. Techniques range from high-throughput screening and computational simulations to advanced machine learning models.

Fragment-based drug discovery (FBDD), particularly using X-ray crystallography beamlines, has become a prominent method for finding initial leads for small-molecule modulators. This involves soaking potential ligands into protein crystals, followed by X-ray diffraction data analysis to detect binding fragments. Despite technological advancements enhancing throughput, variations in crystals unrelated to ligand binding hinder analysis sensitivity.

We present VAE-Assisted Ligand Discovery (VALDO), a novel approach to distinguish meaningful conformational changes from unrelated crystal heterogeneity. VALDO employs a variational autoencoder (VAE) to encode crystallographic reflections into a low-dimensional space, filtering out noise and reconstructing an apo state. This facilitates the creation of difference maps crucial for identifying ligand binding. Comparative benchmarks against methods like PanDDA and Cluster4x show VALDO's superior ability to detect and estimate the pose of bound drug fragments.

---
Source: https://tomesphere.com/paper/PMC12585566