# MolQuery: Prediction of Lipid Synthesizability Using Active Learning

**Authors:** Jonathan Broadbent, Jiří Vymětal, Saeed Moayedpour, Michael Bailey, Saleh Riahi, Akshay Balsubramani, Peter Mikochik, Luc Even, Naresh Gunaganti, Ramesh Dasari, Saswata Karmakar, Hongfeng Deng, Vikram Agarwal, Ziv Bar-Joseph, Sven Jager

PMC · DOI: 10.1021/acsomega.5c09931 · ACS Omega · 2026-02-11

## TL;DR

MolQuery is a new tool that uses active learning to accurately predict whether lipid molecules used in mRNA delivery can be synthesized.

## Contribution

MolQuery introduces an active learning-based pipeline to improve lipid synthesizability prediction for mRNA delivery systems.

## Key findings

- MolQuery improves synthesizability predictions using active learning with small datasets.
- The tool provides highly accurate predictions for lipid molecules in LNP systems.
- It enables efficient filtering of synthetic lipid datasets for practical molecular design.

## Abstract

In the field of molecular
design, Generative Artificial Intelligence
(GenAI) has the potential to create extensive synthetic data sets
encompassing a wide range of chemical properties. However, the practical
application of these data sets is often constrained by the synthesizability
of the molecules within. To address this, it is essential to develop
a robust platform for assessing synthesizability, which is crucial
for constructing effective GenAI-based models for molecular systems.
Here, we introduces MolQuery, a comprehensive pipeline that integrates
active learning (AL) to improve the accuracy of chemical synthesizability
predictions for lipid molecules designed for mRNA delivery via lipid
nanoparticles (LNPs). By leveraging AL, MolQuery efficiently trains
machine learning models using small data sets which greatly improves
upon current solutions for this tasks. Our results demonstrate that
MolQuery produces highly accurate predictions of lipid synthesizability,
making it a valuable tool for filtering synthetic LNP data sets.

## Full-text entities

- **Genes:** LNPK (lunapark, ER junction formation factor) [NCBI Gene 80856] {aka KIAA1715, LNP, LNP1, NEDEHCC, Ul, ulnaless}, EPO (erythropoietin) [NCBI Gene 2056] {aka DBAL, ECYT5, EP, MVCD2}
- **Diseases:** AL (MESH:D007859), inflammatory (MESH:D007249), ML (MESH:C537366), LNPs (MESH:D011017)
- **Chemicals:** cholesterol (MESH:D002784), -Distearoyl-sn-glycero-3-phosphocholine (MESH:C010942), PEG (MESH:D011092), Lipid (MESH:D008055), ChemBERTa-77M-MTR (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12946983/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12946983/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/PMC12946983/full.md

---
Source: https://tomesphere.com/paper/PMC12946983