CREMP: Conformer-rotamer ensembles of macrocyclic peptides for machine learning
Colin A. Grambow, Hayley Weir, Christian N. Cunningham, Tommaso, Biancalani, Kangway V. Chuang

TL;DR
CREMP is a comprehensive dataset of macrocyclic peptide structures and energies, designed to facilitate machine learning models that can predict conformations and aid in therapeutic peptide design.
Contribution
This work introduces CREMP, a large-scale, high-quality structural and energetic dataset for macrocyclic peptides, enabling accelerated machine learning-based modeling.
Findings
Contains 36,198 macrocyclic peptides with structural ensembles.
Includes nearly 31.3 million conformations with energy annotations.
Couples conformational data with permeability information for experimental relevance.
Abstract
Computational and machine learning approaches to model the conformational landscape of macrocyclic peptides have the potential to enable rational design and optimization. However, accurate, fast, and scalable methods for modeling macrocycle geometries remain elusive. Recent deep learning approaches have significantly accelerated protein structure prediction and the generation of small-molecule conformational ensembles, yet similar progress has not been made for macrocyclic peptides due to their unique properties. Here, we introduce CREMP, a resource generated for the rapid development and evaluation of machine learning models for macrocyclic peptides. CREMP contains 36,198 unique macrocyclic peptides and their high-quality structural ensembles generated using the Conformer-Rotamer Ensemble Sampling Tool (CREST). Altogether, this new dataset contains nearly 31.3 million unique macrocycle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChemical Synthesis and Analysis · Computational Drug Discovery Methods · Protein Structure and Dynamics
