SidechainNet: An All-Atom Protein Structure Dataset for Machine Learning
Jonathan E. King, David Ryan Koes

TL;DR
SidechainNet is a comprehensive all-atom protein structure dataset that extends ProteinNet, including detailed sidechain information to enhance machine learning models for protein structure prediction.
Contribution
It introduces SidechainNet, a new dataset with detailed sidechain atomic data, facilitating improved deep learning approaches for protein structure prediction.
Findings
Includes angle and atomic coordinate data for all heavy atoms.
Organized for easy integration with machine learning models.
Provides software tools for data manipulation and training.
Abstract
Despite recent advancements in deep learning methods for protein structure prediction and representation, little focus has been directed at the simultaneous inclusion and prediction of protein backbone and sidechain structure information. We present SidechainNet, a new dataset that directly extends the ProteinNet dataset. SidechainNet includes angle and atomic coordinate information capable of describing all heavy atoms of each protein structure. In this paper, we provide background information on the availability of protein structure data and the significance of ProteinNet. Thereafter, we argue for the potentially beneficial inclusion of sidechain information through SidechainNet, describe the process by which we organize SidechainNet, and provide a software package (https://github.com/jonathanking/sidechainnet) for data manipulation and training with machine learning models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Computational Drug Discovery Methods · Machine Learning in Materials Science
