tmQMg* Data Set: Excited State Properties of 74k Transition Metal Complexes
Hannes Kneiding, David Balcells

TL;DR
The tmQMg* dataset provides excited state properties for 74,000 transition metal complexes to advance machine learning in photochemistry.
Contribution
The novel contribution is the creation of a large dataset of excited state properties for transition metal complexes with solvatochromic effects.
Findings
Excited state properties for 74k transition metal complexes were computed using TD-DFT.
Solvatochromic effects in gas phase and acetone were quantified in terms of wavelength shifts and intensity changes.
Natural transition orbitals were computed to determine charge transfer character of visible excitations.
Abstract
The application of machine learning approaches to meaningful problems in chemistry and materials science is still challenged by the limited availability of data. In order to close this gap, we report the tmQMg* data set, which provides excited state properties for 74k mononuclear transition metal complexes extracted from the Cambridge Structural Database. All properties were computed at the TD-DFT ωB97xd/def2SVP level of theory. The strongest electron excitations in the ultraviolet, visible, and near-infrared ranges are included, together with the wavelengths and intensities of the first 30 excited states. Further, natural transition orbitals were computed for the strongest excitations in the visible range to determine the nature of the associated charge transfers. By computing the TD-DFT spectra in both gas phase and acetone, we quantified solvatochromic effects, which are also…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
1
2
3
4
5
6
7
8
9
10| Column name in CSV | description |
|---|---|
| lambda_i | Wavelengths, λ, in nm; i ∈ [1–30] is a counter |
| f_i | Oscillator strengths, |
| lambda_max | Wavelengths having
the largest |
| f_max |
|
| sigma | Band broadness, σ, in nm |
| M_contribution | The relative contribution (%) of the metal to the NTO orbitals involved in the strongest Vis excitation |
| L_contribution | The relative contribution (%) of the ligand(s) to the NTO orbitals involved in the strongest Vis excitation |
| transition_nature_vis | The charge transfer nature of the strongest Vis excitation. Categorical over the {ddT, LMCT, MLCT, LLCT} set |
| lambda_delta | Solvatochromic wavelength shift, Δλ, in nm |
| f_delta | Solvatochromic oscillator strength
change, Δ |
| vis_to_vis | Vis to Vis shift. Boolean, True or False |
| uv_to_vis | UV to Vis shift. Boolean, True or False |
| nir_to_vis | nIR to Vis shift. Boolean, True or False |
| vis_to_uv | Vis to UV shift. Boolean, True or False |
| vis_to_nir | Vis to nIR shift. Boolean, True or False |
| bathochromic | Δλ > 0 from gas phase to acetone. Boolean, True or False |
| hypsochromic | Δλ < 0 from gas phase to acetone. Boolean, True or False |
| hyperchromic | Δ |
| hypochromic | Δ |
| criteria | bathochromic | hypsochromic |
|---|---|---|
| Δλ > 0 | true | false |
| Δλ < 0 | false | true |
- —HORIZON EUROPE Marie Sklodowska-Curie Actions10.13039/100018694
- —RCN program for the National Centers of ExcellenceNA
- —RCN FRIPRO program for ground-breaking researchNA
- —Norwegian Supercomputing ProgramNA
- —Norwegian Supercomputing ProgramNA
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Chemical Physics Studies · Catalysis and Oxidation Reactions
Introduction
Artificial intelligence (AI) is quickly becoming a widespread tool across all scientific disciplines including chemistry and materials science.? By fitting to reference data, discriminative machine learning (ML) models learn to predict properties of molecules and materials based on sparse feature sets. Their computational efficiency makes ML approaches especially attractive for high-throughput studies in molecular and materials design where large numbers of compounds are investigated in order to identify promising candidates for specific applications. ?,? Compared to theoretical reference methods, such as density functional theory (DFT), a speed-up of up to 5 orders of magnitude can be achieved at no significant loss of accuracy. ?,? Furthermore, generative AI approaches can be used in an inverse design framework where models are trained to generate new compounds that are similar to the reference data. ?,? Both approaches have received a lot of attention particularly from the chemistry and materials science communities, promoting their rapid advancement. ?,?
Despite the empirically proven potential of AI in accelerating the materials discovery pipeline, its application to interesting, real-world problems is still challenged by the limited availability of appropriate and sufficiently large data sets.? Due to the prohibitively high cost associated with experimental data sets, most large data sets are based on computations oftentimes obtained with quantum chemistry methods.? While still significant, the reduced cost of these approaches, and their easier design and scale-up, allowed for the creation of large data sets that are fit for the application of ML methods. However, most of the existing computational data sets are limited to either organic chemistry (for example: QM ?−? ? or GDB ?,? data set series) or materials science (for example: Materials Project,? Open Quantum Material Database?), for which extensive work has been conducted before. The more recent OMol25 data set integrated multiple molecular sets of various nature.?
Lately, there has been a surge of interest in metal-containing systems such as transition metal complexes (TMCs) and metal–organic frameworks (MOFs) for diverse applications in catalysis and renewable energies. This is reflected in the creation of a significant number of data sets for both TMCs (tmQM series ?−? ? ) and MOFs (CoRE data set ?−? ? ). In particular, for TMCs, the available data sets have supported the development of ML models for the prediction of their electronic properties ?,?,?−? ? ? as well as their (un)conditional generation. ?,?,? However, since most data sets only contain ground state properties, the development of models for photochemistry and, in general, applications to nonadiabatic processes involving excited states, ?−? ? is significantly impeded. Data sets for excited state properties including absorption spectra and charge transfer behavior are rare, though available for small molecules.? Nonetheless, such properties are of particular interest for TMCs due to their potential use as photocatalysts,? molecular sensitizers in solar cells,? optical switches in optoelectronic devices,? or photosensitizers in cancer phototherapy.?
In this work, we report the tmQMg* data set, providing electron excitation data for 74k mononuclear TMCs. The data was computed in the Saga supercomputer of the Norwegian high-performance computing cluster at the TD-DFT(ωB97xd/def2SVP) level of theory ?−? ? using Gaussian 16, Revision C.01.? tmQMg* belongs to the tmQM data set series (Figure), ?−? ? and it is thus based on a large set of TMCs that are experimentally known and reported in the Cambridge Structural Database? (CSD, 2024 release), often used for data-driven approaches. ?,? With all 30 transition metals from the 3d, 4d, and 5d series present, and an underlying pool of more than 35k unique ligands, tmQMg* reflects the vast diversity of the TMC chemical space known to date. The tmQMg* data set will enable the training of both discriminative and generative AI models with respect to electron excitations in TMCs and their charge transfer nature, light absorption intensity and broadness over the whole UV–vis-nIR range, and solvatochromic effects. These models will accelerate the discovery of photoactive TMCs. tmQMg* is openly available under the CC BY 4.0 License.
tmQMg in the context of our founding work on the tmQM data set series. −
Further extensions have been reported by other research groups. −*
The tmQMg* Data Set
The tmQMg* data set was derived from single point TD-DFT(ωB97xd/def2SVP) calculations ?−? ? on the 74,555 TMCs of the parent tmQMg graph data set, in which geometries were already optimized at the DFT(PBE/def2SVP) level.? This level of theory was chosen due to the relevance of including long-range corrections in TD-DFT calculations. The Tamm-Dancoff approximation was not used. 99.6% of these calculations converged successfully, yielding the electron excitation data of the 74,281 TMCs included in tmQMg*. Natural transition orbitals (NTOs) were also computed at the same level of theory to determine the charge transfer nature of the transitions. Further, solvatochromic effects were quantified by computing all TD-DFT data in two sets: one in gas phase and one in acetone, using the implicit SMD solvation model.?
In the frame of the transition metal quantum mechanics series, which includes the tmQM,? tmQMg,? and tmQMg-L^22^ data sets (Figure), there is plenty of additional data available for each TMC in tmQMg*: quantum properties at different xTB and DFT levels of theory (energies, geometries, HOMO–LUMO gaps, polarizabilities, atomic charges, and dipole moments, among others),? quantum-informed molecular graphs for deep learning models (directed and undirected),? and a substantial portion of the ligands forming this TMC space,? including the ligand charges and metal-coordination anchors needed to design, generate, and explore massive combinatorial spaces. The TMC SMILES based on the recent work of Jensen and co-workers? are also available. In particular, we used the Hückel approach, in which this method provides the graph representation of the TMC needed to formulate its SMILES string. All TMCs were extracted from the 2024 release of the CSD, constraining the charge to the {−1, 0, +1}e set.? This constraint, was originally implemented to avoid issues in the quantum chemistry calculations caused by smaller and higher charges. All TMCs in tmQMg* have an even number of electrons and all TD-DFT calculations on them were carried out in the closed-shell singlet state.
Chemical Distributions
Figures and ? show the chemical composition of the tmQMg* data set in terms of element distributions. The 3d, 4d, and 5d transition metals appear represented to a similar extent and, for all three series, the late metals are clearly more present than the early (Figure). TMCs with metal centers from groups 8, 9, and 10 are the most abundant, likely due to their popularity in impactful applications like catalysis. In this regard, and despite the computational nature of the tmQMg* data set, its CSD origin reflects, and is thus biased by, the interest of the research community in TMC chemistry over a period of 60 years (1965–2024). It is important to note that this can introduce significant biases if the tmQMg* data set is used to train ML models for the prediction of properties related to photochemistry. Transferable models across the transition metal series will likely require data augmentation strategies for the early transition metals, where only group 4 appears to be well represented and balanced over the three series.
Distribution of the transition metal elements in the tmQMg data set.*
Distribution of the nonmetal elements in the tmQMg data set. The insets show the abundance of C and H relative to the heteroelements, and the TMC charge distribution.*
In line with the prevalent organic nature of the ligands, C and H are the most abundant elements in the data set, followed by N and O, which are common metal anchors, and the less abundant P and Cl associated with phosphine complexes and metal-chloride salts (Figure). Regarding charges, the distribution is strongly dominated by neutral TMCs. Charges larger or smaller than +1 and −1, respectively, were excluded by the filters used to extract the TMCs from the CSD into the parent tmQM? and tmQMg? data sets. To some extent, this limits the value of tmQMg* in photoredox chemistry involving highly charged species, suggesting an interesting (as well as challenging and expensive) direction for data augmentation.
Electron Excitations
For each TMC, the TD-DFT calculations yielded the first 30 electron excitations, for which both the wavelength (λ) and the oscillator strength (f) were extracted and included in the data set. Table provides a systematic list of all TD-DFT properties in tmQMg*, including the column labels used in the tabular CSV file in the data repository.
1: TD-DFT(ωB97xd/def2SVP) Properties in the tmQMg Dataset*
Based on the value of λ, the ultraviolet (UV), visible (Vis), and near-infrared (nIR) regions of the spectrum were defined as follows:
- UV: λ < 350 nm
- Vis: 350 nm ≤ λ ≤ 825 nm
- nIR: λ > 825 nm
Most electron excitations lie in the UV region of the spectrum, followed by the Vis and nIR regions. The gas phase/acetone percentages are 90/92% (UV), 9/8% (Vis), and <1/1% (nIR). The prevalence of UV excitations, both in number and intensity, can be associated with the π → π* transitions, which, in a TMC chemical space, can originate not only from the isolated ligands, but also from diverse metal–ligand and ligand–ligand moieties.
Figure shows the averaged spectra over the entire tmQMg* data set. In the gas phase, and considering all electron excitations, the spectrum has a prominent maximum in the UV region, at λ = 185 nm. Moving toward longer wavelengths, light absorption decays and, after forming a shoulder over the UV|Vis limit, it yields two lower peaks at λ = 625 nm, within the Vis, and at λ = 836 nm, within the nIR. When averaged after applying the f > 0.01 threshold, the spectrum becomes more balanced, with these three peaks appearing at similar λ values (185, 645, and 835 nm), plus an additional one in the Vis region (375 nm) that can be related to the aforementioned shoulder. The difference between the two spectra reflects the underlying structure of the data. For example, compared to the UV range, the electron excitations in the Vis are less numerous and fewer of them have high intensity. In both spectra there is a small but significant bathochromic shift, as reflected by the relocation of the absorption peaks to longer wavelengths. For example, in the averaged f > 0.01 spectra, the four peaks, ordered from UV to nIR, undergo Δλ shifts of 4.7, 10.3, 37.2, and 16.6 nm, when, respectively, moving from gas phase reference to acetone. This effect can be related to the larger polarization of the excited states often caused by electron excitations.
Averaged spectra over the whole data set for both the gas and acetone phases, considering either all excitations (top) or only those with f > 0.01 (bottom). The vertical dashed lines mark the limits between the UV, Vis, and nIR regions of the spectrum. The two Vis peaks are distinguished with the labels 1 and 2. Curves were smoothened with a Lorentzian kernel.
Figure illustrates the chemical diversity within the tmQMg* data set by showing ten examples of TMCs for which light absorption is maximal within the wavelength ranges defined by the four broad peaks of the gas phase f > 0.01 averaged spectrum (Figure). Both early and late transition metals from the 3d, 4d, and 5d series are present, in oxidation states within the {0, II, III, IV} set. There is also a total of 12 diverse ligands, which, in several cases, include different π-conjugated systems promoting absorption in the UV and Vis regions of the spectrum.
Examples of TMCs showing maximal light absorption in the ∼185 (UV), 375 (Vis 1), 645 (Vis 2), and 835 (nIR) nm peaks of the averaged f > 0.01 gas phase spectrum shown in Figure . Each TMC is also labeled with its 6-character CSD code and metal oxidation state. Element color code: orange metal, pink P, gray C, white H, blue N, red O, green Cl, and yellow S.
The electron excitations with the strongest intensity (f max) in the UV, Vis, and nIR ranges were also determined and included in the data set together with the corresponding wavelength (λ_max_; Table). For the TMCs that have no excitations, or no excitations with f > 0.01, no maximum absorption values are reported separately. Additionally, the band broadness, σ, was calculated for each of the three spectral regions, , as
where, for each , denotes the absolute difference between the smallest and largest wavelengths, and denotes the number of excitations within the range.
All 17 properties in Table are related to the diverse electronic structure space formed by the 74k TMCs in tmQMg* and, therefore, they appear correlated in different ways and to different extents. Figure shows two examples of such correlations for the gas phase data. The plot of f max versus HOMO–LUMO gap shows that the density of highly intense electron excitations decreases with the width of the gap, a trend that can be related to an increasing number of symmetry-forbidden d → d, d → π*, and π → d transitions. Further, the data appears stratified, with the nIR range shifted toward the smaller gaps, and the UV toward the larger gaps, with the Vis in the middle, in line with the excitation energy limits implicitly set by the HOMO–LUMO gap. In contrast, the plot of the natural metal charge against λ_max_ does not reflect any clear trend, in line with the lack of a strong correlation between these two properties, since both the metal and the ligand can have a strong influence on λ_max_.
Scatter plots of the HOMO–LUMO gap (top) and natural metal charge (bottom) against the oscillator strength and wavelength, respectively, of the strongest excitations in the UV, Vis, and nIR regions of the spectrum, in gas phase. Metal charges, in e units, were taken from tmQMg, at the PBE0/def2TZVP level of theory.
Charge Transfer
For all TMCs in tmQMg* absorbing light in the visible range, NTO orbitals were computed for the most intense electron excitation at λ = λ_max_ ∈ Vis, with f = f max > 0.01. The data defining the NTOs was used to classify these excitations as either a d-d orbital transition (ddT) at the metal center or one of these three charge transfer (CT) events: ligand-to-ligand (LLCT), metal-to-ligand (MLCT), and ligand-to-metal (LMCT). More precisely, we selected the pair of virtual (V) and occupied (O) NTOs having the largest eigenvalues (electron occupations) and, for both of them, we computed the ligands density, ρ(L), resulting from summing the squared coefficients (c ^2^) of the natural atomic orbitals (in total, N_Orb_) over all atoms from all ligands in the TMC (in total, N L)
The same density was also computed for the metal alone, ρ(M)
Further, these densities were computed for both the V and O NTOs, overall yielding this set of values
for each TMC. We finally computed the corresponding set of relative contributions
using this expression
where τ is either M or L, and γ is either V or O.
In order to define the DACT labels of the charge transfer classes, where the donor (D) and acceptor (A), are either the metal or the ligand, the following relationships were used
For D = A = M, the resulting MMCT label was replaced by the more suitable ddT. It should be noted that the 0.5 threshold is used in a context in which the NTOs are rarely fully or strongly localized over the metal or the ligands alone but rather delocalized, to a variable extent, over these two structural moieties. For this reason, the tmQMg* data set also provides the NTO data of the electron excitations, allowing the users to modify the definition of the classes {ddT, LLCT, MLCT, LMCT} by simply adjusting the ρ(M)O ^rel^ and ρ(M)V ^rel^ thresholds.
Figure shows the distribution of the different transition classes, all referring to the most intense transitions in the Vis range, for both the gas phase and acetone TD-DFT data. The small portion of ddT transitions can be related to their symmetry-forbidden nature, which prevents them from having strong intensities. LMCT and MLCT excitations are much more abundant and, together, they constitute nearly half of the excitations, in a proportion similar to that of the LLCT. The prevalence of the MLCT excitations over the LMCT may originate from the higher abundance of the late transition metals relative to the early (Figure). The increase in the number of both these excitations from the gas to the acetone phase can be related to the polarized nature of the metal–ligand bonds.
Distribution of the transition classes for the most intense electron excitations in the Vis range of the spectrum.
Figure shows the average dipole moments and natural metal charges for each transition class in gas phase and acetone. The average dipole moments are systematically higher in acetone, as expected from the polarizing effects of the solvent. In both series, the dipole moment is similar regardless of the transition class, with only the two classes involving the metal, LMCT and MLCT, being slightly below and above the average, respectively, which could also be due to the polarized nature of the metal–ligand bonds. The average natural metal charges are quite similar in gas phase and acetone, with minor variations for the transitions involving the metal center. The transitions originating from the metal, that is, ddT and MLCT, are consistently associated with the smaller charges and thus the most electron-rich metal centers.
Average dipole moment (left) and natural metal charge (right) for the transition classes of the most intense electron excitations in the Vis range of the spectrum. The dipole moment was computed at the same TD-DFT level of theory, whereas the metal charges, in e units, at the PBE0/def2TZVP level, were taken from the tmQMg data set.
Solvatochromic Effects
The solvatochromism introduced by acetone relative to the gas phase, and observed in the average tmQMg* spectra shown in Figure, was quantified and reported in the data set after considering these four cases:
Case 1. Shift within the Visible
If a TMC absorbed light in the Vis range with f > 0.01 in both the gas and acetone phases, the solvatochromic effect was defined relative to the most intense electron excitations at λ = λ_max_ calculating both the wavelength shift
and the change in the associated oscillator strength, f λ_max,Vis _
where the ‘gp’ and ‘ac’ superscripts denote the gas and acetone phases, respectively.
In this case, the vis-to-vis property of Table was set to True, whereas the uv-to-vis, NIR-to-vis, vis-to-uv, and vis-to-NIR properties were all set to False.
Case 2. Shift to the Visible
If a TMC absorbed light in the Vis range with f > 0.01 but only in the acetone phase, the most intense electron excitation defined by was selected and used to calculate the solvatochromic effects relative to the most intense electron excitation in the gas phase in either the UV or nIR regions, selecting the one with the closest wavelength if both existed; that is
and
In Table, either the NIR-to-vis or uv-to-vis properties were set to True, respectively, while setting all others to False.
Case 3. Shift Away from the Visible
If a TMC absorbed light in the Vis range with f > 0.01 but only in the gas phase, the most intense electron excitation defined by was selected and used to calculate the solvatochromic effects relative to the most intense electron excitation in the acetone phase in either the UV or nIR regions, selecting the one with the closest wavelength if both existed; that is
and
In Table, either the vis-to-NIR or vis-to-UV properties were set to True, respectively, while setting all others to False.
Case 4. No Shift
If a TMC did not exhibit any electron excitation in the Vis range with f > 0.01 in neither the gas nor the acetone phases, both the wavelength shift and the intensity change were set to zero.
In this case, in Table, vis-to-vis = uv-to-vis = NIR-to-vis = vis-to-NIR = vis-to-uv = False. This also includes a few instances derived from Cases 2 and 3, in which Δλ and Δf could not be defined due to inexistent transitions with f > 0.01 in neither the UV nor the nIR ranges.
With these definitions of Δλ and Δf, the bathochromic, hypsochromic, hyperchromic, and hypochromic properties of Table were set to either True or False based on the relationships defined in Table. Figure shows the proportional distributions of these properties in tmQMg*, also providing this information for Cases 1–4. Whereas approximately half of the TMCs in the data set have no activity in the Vis range, and thus no solvatochromism, the other half mostly undergo bathochromic and hyperchromic effects.
**2: Criteria Defining the Solvatochromism Properties in
Pie charts showing the relative % amounts of TMCs in Cases 1–4, and exhibiting batho- and hypsochromic effects, as well as hyper- and hypochromic effects.
Figure shows the scatter of Δf versus Δλ for all TMCs in tmQMg* subject to the solvatochromic effects defined by Cases 1–3. The data point density shows that many TMCs exhibit solvatochromic effects with Δf and Δλ values clustered within the approximate ranges [−0.25, +0.50] and [−100, +100] nm, respectively. There is thus a prevalence of the hyperchromic effect over the hypochromic, whereas the bathochromic and hypsochromic shifts are more balanced, in line with the pie charts shown in Figure. Δf and Δλ appear weakly correlated. Nearby the origin and within the region having the highest data point density, either the intensity becomes larger with Δλ ≈ 0 or, orthogonally, the wavelength becomes shorter or longer with Δf ≈ 0. There are also two significantly dense regions in the opposing (Δλ > 0, Δf < 0) and (Δλ < 0, Δf > 0) quadrants, suggesting a weak correlation in which Δλ increases while Δf decreases.?
Solvatochromic effects in tmQMg in (Δλ, Δf) coordinates. The density values were determined with a Gaussian kernel, using the min–max normalization over the [0, 1] range.*
Conclusion
The present work introduced the tmQMg* data set, which contains excited state properties computed at the TD-DFT(ωB97xd/def2SVP) level for 74,281 TMCs, all known in the CSD database and reflecting the wide diversity of the metal–organic space known to date from crystallography. The data set provides information regarding the maximum absorption wavelength and band broadness within the UV, Vis, and nIR ranges of the spectrum. Further, the nature of the charge transfer in the most intense electron excitation in the Vis range is included, giving also the metal and ligand contributions to the associated NTO orbitals. All data was computed for both the gas and acetone phases, yielding the solvatochromic effects on the wavelengths and oscillator strengths, which are also provided. The tmQMg* data set will enable further data-driven studies on the photochemistry of the transition metals, including the development of discriminative ML models for the prediction of absorption spectra and generative ML models for the de novo design of novel TMC chromophores. The level of theory used in this work, TD-DFT(ωB97xd/def2SVP), was chosen aiming at a reasonable balance between cost and accuracy, given the large size of the data, 74k. Other functionals and basis sets, for example, ωB97M-V? and def2TZVP,? should, in principle, provide higher accuracy but also at a much higher cost. From this perspective, the present work could motivate the development of Δ-ML models for correcting TMC excited state properties toward higher levels of theory. Conversely, the tmQMg* data set can be used to benchmark lower levels of theory like, for example, semiempirical methods using the Tamm-Dancoff approach.
Supplementary Material
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Butler K.Davies D. W.Cartwright H.Isayev O.Walsh A.Machine learning for molecular and materials science Nature 201855954755510.1038/s 41586-018-0337-230046072 · doi ↗ · pubmed ↗
- 2Pollice R.dos Passos Gomes G.Aldeghi M.Hickman R. J.Krenn M.Lavigne C.Lindner-D’Addario M.Nigam A.Ser C. T.Yao Z. P.Data-Driven Strategies for Accelerated Materials Design Acc. Chem. Res.20215484986010.1021/acs.accounts.0c 0078533528245 PMC 7893702 · doi ↗ · pubmed ↗
- 3Janet J. P.Ramesh S.Duan C.Kulik H. J.Accurate Multiobjective Design in a Space of Millions of Transition Metal Complexes with Neural-Network-Driven Efficient Global Optimization ACS Cent. Sci.2020651352410.1021/acscentsci.0c 0002632342001 PMC 7181321 · doi ↗ · pubmed ↗
- 4Wei J.Chu X.Sun X.-Y.Xu K.Deng H.-X.Chen J.Wei Z.Lei M.Machine learning in materials science Info Mat 2019133835810.1002/inf 2.12028 · doi ↗
- 5Westermayr J.Gastegger M.Schütt K. T.Maurer R. J.Perspective on integrating machine learning into computational chemistry and materials science J. Chem. Phys.202115423090310.1063/5.004776034241249 · doi ↗ · pubmed ↗
- 6Sanchez-Lengeling B.Aspuru-Guzik A.Inverse molecular design using machine learning: Generative models for matter engineering Science 201836136036510.1126/science.aat 266330049875 · doi ↗ · pubmed ↗
- 7Anstine D. M.Isayev O.Generative models as an emerging paradigm in the chemical sciences J. Am. Chem. Soc.20231458736875010.1021/jacs.2c 1346737052978 PMC 10141264 · doi ↗ · pubmed ↗
- 8Deringer V. L.Bartok A. P.Bernstein N.Wilkins D. M.Ceriotti M.Csanyi G.Gaussian Process Regression for Materials and Molecules Chem. Rev.2021121100731014110.1021/acs.chemrev.1c 0002234398616 PMC 8391963 · doi ↗ · pubmed ↗
