An interpretable molecular descriptor for machine learning predictions in atmospheric science
Linus Lind, Hilda Sandstr\"om, Patrick Rinke

TL;DR
This paper introduces ATMOMACCS, an interpretable molecular descriptor tailored for atmospheric compounds, significantly improving machine learning predictions of key properties like vapor pressures and transition temperatures.
Contribution
The paper presents ATMOMACCS, a novel molecular descriptor combining MACCS keys with motifs inspired by SIMPOL, enhancing interpretability and prediction accuracy for atmospheric molecules.
Findings
Improved prediction accuracy for vapor pressures and phase transition properties.
Feature analysis reveals key molecular features governing properties.
Demonstrated generalizability across multiple datasets and properties.
Abstract
The study of aerosol formation and chemistry using machine learning is limited by the lack of molecular descriptors suited to atmospheric compounds. Interpretable models are particularly affected because they often rely on dictionary-based descriptors tied to specific molecular substructures, which currently fail to capture the full range of organic atmospheric compounds, including large, highly oxidized molecules common in the atmosphere. We introduce ATMOMACCS, an interpretable descriptor combining the 166 binary keys of the MACCS fingerprint with motifs inspired by the SIMPOL method for estimating saturation vapor pressures. We show that ATMOMACCS based models improve predictions of saturation vapor pressures (7-8 % error reduction), equilibrium partition coefficients (5 % and 9 % error reduction), glass transition temperatures (22 % error reduction), and enthalpy of vaporization (61…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Chemical Physics Studies · Computational Drug Discovery Methods
