LipidBERT: A Lipid Language Model Pre-trained on METiS de novo Lipid Library
Tianhao Yu, Cai Yao, Zhuorui Sun, Feng Shi, Lin Zhang, Kangjie Lyu,, Xuan Bai, Andong Liu, Xicheng Zhang, Jiali Zou, Wenshou Wang, Chris Lai and, Kai Wang

TL;DR
LipidBERT is a novel pre-trained language model for lipids, leveraging a large virtual lipid database to improve lipid property prediction and facilitate screening for lipid nanoparticles, integrating dry and wet lab data.
Contribution
This work introduces LipidBERT, the first language model trained on virtual lipids, enhancing downstream lipid analysis and screening capabilities with dual-language operation.
Findings
LipidBERT achieves state-of-the-art LNP property prediction.
LipidBERT effectively utilizes virtual lipid data for downstream tasks.
Dual-language LipidBERT enables versatile lipid screening.
Abstract
In this study, we generate and maintain a database of 10 million virtual lipids through METiS's in-house de novo lipid generation algorithms and lipid virtual screening techniques. These virtual lipids serve as a corpus for pre-training, lipid representation learning, and downstream task knowledge transfer, culminating in state-of-the-art LNP property prediction performance. We propose LipidBERT, a BERT-like model pre-trained with the Masked Language Model (MLM) and various secondary tasks. Additionally, we compare the performance of embeddings generated by LipidBERT and PhatGPT, our GPT-like lipid generation model, on downstream tasks. The proposed bilingual LipidBERT model operates in two languages: the language of ionizable lipid pre-training, using in-house dry-lab lipid structures, and the language of LNP fine-tuning, utilizing in-house LNP wet-lab data. This dual capability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetics, Bioinformatics, and Biomedical Research
MethodsLib
