Mapping and Classifying Molecules from a High-Throughput Structural Database
Sandip De, Felix Musil, Teresa Ingram, Carsten Baldauf, Michele, Ceriotti

TL;DR
This paper demonstrates how machine learning can be used to analyze large structural databases of molecules, revealing structure-property relations, identifying outliers, and understanding the effects of perturbations.
Contribution
It introduces a novel metric-based approach combined with clustering and dimensionality reduction to analyze high-throughput molecular data sets.
Findings
Machine learning helps reveal structure-property relations.
Outliers and inconsistencies can be identified effectively.
Perturbations' effects on conformer stability are rationalized.
Abstract
High-throughput computational materials design promises to greatly accelerate the process of discovering new materials and compounds, and of optimizing their properties. The large databases of structures and properties that result from computational searches, as well as the agglomeration of data of heterogeneous provenance leads to considerable challenges when it comes to navigating the database, representing its structure at a glance, understanding structure-property relations, eliminating duplicates and identifying inconsistencies. Here we present a case study, based on a data set of conformers of amino acids and dipeptides, of how machine-learning techniques can help addressing these issues. We will exploit a recently developed strategy to define a metric between structures, and use it as the basis of both clustering and dimensionality reduction techniques showing how these can help…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Computational Drug Discovery Methods · History and advancements in chemistry
