Predictive Quantum Vibrational Spectra through Active Learning 4G-NNPs
Md Omar Faruque, Dil K. Limbu, Nathan London, Mohammad R. Momeni

TL;DR
This paper introduces a new method using advanced neural networks to accurately predict vibrational spectra of complex systems like water and its interface with air.
Contribution
The novel contribution is the development of 4G-HDCNNPs integrated with quantum effects for accurate infrared spectral simulations without explicit dipole training.
Findings
The framework accurately simulates infrared spectra for bulk water and air–water interfaces.
Nonlocal charge transfer effects and nuclear quantum effects are seamlessly integrated without empirical parameters.
The methodology is general and practical for predictive spectral simulations of complex systems.
Abstract
Predictive simulation of vibrational spectra of complex condensed-phase and interface systems with thousands of degrees of freedom has long been a challenging task of modern condensed matter theory. In this work, fourth-generation high-dimensional committee neural network potentials (4G-HDCNNPs) are developed using active learning and query-by-committee, and introduced to the essential nuclear quantum effects (NQEs) as well as conformational entropy and anharmonicities from path integral (PI) molecular dynamics simulations. Using representative bulk water and air–water interface test cases, we demonstrate the accuracy of the developed framework in infrared spectral simulations. Specifically, by seamlessly integrating nonlocal charge transfer effects from 4G-HDCNNPs with the NQEs from PI methods, our introduced methodology yields accurate infrared spectra using predicted charges from the…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21- —Office of Advanced Cyberinfrastructure10.13039/100000105
- —Office of Advanced Cyberinfrastructure10.13039/100000105
- —Office of Advanced Cyberinfrastructure10.13039/100000105
- —Office of Advanced Cyberinfrastructure10.13039/100000105
- —Office of Advanced Cyberinfrastructure10.13039/100000105
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Spectroscopy and Quantum Chemical Studies · Advanced Chemical Physics Studies
Vibrational spectroscopy is an excellent tool for probing complex processes across different condensed-phase and interface environments. ?−? ? ? To obtain accurate predictive vibrational spectra, free from ad hoc fittings or empirical parametrizations, it has been well-established that robust dynamical methodologies that incorporate anharmonicities and conformational entropies, as well as the essential nuclear quantum effects (NQEs) are needed to be developed and integrated with accurate potential energy surfaces (PESs) that include nonlocal charge transfer interactions. ?,? Path integral molecular dynamics (PIMD) offers an accurate and efficient route to include NQEs by representing a quantum particle as a series of identical classical particles, referred to as beads connected via harmonic springs to form a closed loop, referred to as a ring polymer. ?−? ? Accurate equilibrium properties that incorporate NQEs can be calculated using PI Monte Carlo (PIMC) or PI molecular dynamics (PIMD) simulations.? Similarly, approximate real-time PI methods, including thermostated ring polymer MD (TRPMD),? partially adiabatic centroid MD (PA-CMD),? and a myriad of methods branched from them, ?−? ? ? can then be used for the simulation of dynamical properties including vibrational spectra through the calculations of appropriate time correlation functions with NQEs included.
As mentioned above, accurate PESs that incorporate long-range charge transfer interactions in condensed phases and interfaces are crucial for predictive spectral simulations. Traditional empirical force fields are computationally efficient but lack the accuracy and transferability needed to capture complex processes in condensed phases and interfaces. On the other hand, real-time ab initio PI simulations where forces felt by nuclei are calculated on-the-fly using electronic structure methods are prohibitively expensive for obtaining converged real-time dynamical properties, especially at low temperatures, where a large number of beads is required.? Machine learning interatomic potentials (MLIPs) have recently been shown to be able to bridge the gap between the accuracy and the cost of these simulations.? This is due to the ability of MLIPs in leveraging data-driven approaches to approximate PESs with near ab initio accuracy at a fraction of the computational expense.? Although MLIPs have been developed to compute vibrational spectra such as infrared (IR) spectra, even the most recent generalized MLIPs struggle in accurately reproducing high-frequency stretch peak positions when NQEs are neglected.?
Other strategies have also been developed and employed to capture accurate environmental dependence in condensed phases and interfaces using vibrational spectral simulations. For instance, the permutationally invariant polynomial (PIP) approach has proven highly effective in constructing accurate PESs for water clusters and bulk water, enabling accurate quantum-mechanical spectral calculations.? Similarly, the MB-pol many-body potential has been shown to reproduce the IR, Raman, and vibrational sum-frequency generation (vSFG) spectra of water across different phases through an accurate representation of many-body interactions.? More recently, the deep neural network potential DNN@MB-pol, trained on energies and forces generated from MB-pol, has enabled efficient simulations of vSFG spectra of the air-ice interface at a substantially reduced computational cost.? Moreover, explicit incorporation of long-range electrostatics via machine-learned flexible-charge models coupled to efficient shadow MD schemes has enabled stable simulations that account for ion, polarization, and charge-relaxation effects.? Other neural-network models have also been developed to simulate reactive interfacial systems and to predict dipoles and polarizabilities.? Differentiable molecular simulation has also been introduced that can refine MLIPs based on experimental spectra, leading to higher accuracy simulations.?
One of the early established MLIP architectures that uses artificial neural networks (ANNs) is the high-dimensional neural network potentials (HDNNPs), which are widely used to construct PESs for both gas-phase and condensed-phase molecular systems.? Early generations of HDNNPs primarily focus on local atomic environments and are shown to excel in capturing short-range interactions. However, they fail to accurately capture nonlocal interactions, which are crucial for systems exhibiting significant charge transfer or polarization effects.?
To address this limitation, fourth-generation high-dimensional neural network potentials (4G-HDNNPs) have recently been introduced.? 4G-HDNNPs integrate environment-dependent atomic energies with long-range electrostatics from learned atomic charges, enabling a robust description of complex environments.? More specifically, the 4G architecture employs a charge equilibration scheme to determine atomic charges that reflect the global electronic structure. This approach allows for the inclusion of nonlocal phenomena such as long-range charge transfer relevant in redox chemistry. ?,? A key advantage of 4G-HDNNPs is their inherent ability to predict and provide atomic charges, once appropriately trained on ab initio reference charges, a feature that is lacking in previous generations. Therefore, the charges predicted by 4G-HDNNP-enabled dynamics simulations can, in principle, be directly utilized for simulating IR spectra through the calculation of dipole autocorrelation functions, without the explicit training of dipoles. Alternative hybrid methods, such as CombineNet,? also incorporate machine-learned short-range energies with explicit long-range electrostatics. Such methods fundamentally differ from 4G-HDNNP as their short-range component does not include any charge information, whereas 4G-HDNNPs add charge as an input layer for the short-range neural networks as well.?
In this work, 4G-HDNNPs are developed and integrated with different path integral methods for the accurate prediction of the IR spectra of representative bulk water and air–water interface systems. To achieve fast and efficient spectral calculations, an active learning framework using query-by-committee (QbC) has been developed and employed to facilitate training of the 4G-HDNNP models. Our QbC approach enables data-efficient training through an iterative selection of the most distinct and informative configurations. Thus, we refer to our developed potentials 4G-HDCNNPs, where “C” stands for committee. Water’s complex structure arises from its dynamic hydrogen-bond (H-bond) network, with differential intra- versus intermolecular H-bonds, which significantly influence its vibrational spectral properties.? Recent studies have shown that this complex network supports propagating optical phonon-like modes, indicating coherent long-range interactions.? These long-range effects play a crucial role in water’s structure and vibrational dynamics, affecting energy transfer processes and the overall behavior of its H-bond network.
Although attempts have been made to combine HDNNPs with PIMD simulations before,? this work is the first report to directly employ fourth-generation real-time PI simulations for the calculations of IR spectra, bypassing the need for the explicit training of the dipoles. Therefore, unlike other approaches that require training dipole moment surfaces separately, we show in this work that the 4G-HDCNNP framework is able to inherently capture the necessary dipole fluctuations for IR spectral simulations through its physical description of nonlocal charges. The 4G-HDCNNPs reported here have two pivotal advances compared to the second-generation HDNNP (2G-HDNNP) models. First, 4G-HDCNNPs incorporate explicit nonlocal charge-transfer interactions, capturing interactions that 2G models neglect and thereby delivering higher accuracy in energies and forces for environments with significant charge transfer and polarization effects. ?,?,? Second, as the 4G architecture predicts atomic charges, charge-dependent observables such as dipole moment can be evaluated directly, without the approximations that 2G models require. This includes the reliance of 2G models on fixed atomic charges and neglecting long-range charge transfer and polarization effects.? For comparison, we have also simulated IR spectra using 2G models with fixed point charges from two commonly used force fields. Below, we first provide a brief theoretical background for 4G-HDCNNPs, along with details of model training and IR spectral simulations, followed by our simulated IR spectra for the considered bulk water and air–water interface systems.
4G-HDNNPs seamlessly combine atom-centered descriptors with environment-dependent charges obtained from a global charge equilibration, taking into account both local short-range interactions and nonlocal long-range electrostatics. This is achieved by employing one set of neural networks to model local, short-range contributions and a second set to predict environment-dependent electronegativities for long-range electrostatics. This enables the potential to respond to nonlocal charge transfer and polarization without sacrificing accuracy in capturing local interactions. The schematic of the networks and their interrelation is depicted in Figure. The total energy in 4G-HDNNPs is expressed as?
where R and Q denote atomic positions and total charge, respectively. The E elec term represents long-range electrostatic interactions based on atomic charges, while E short denotes short-range interactions. The electrostatic component E elec is computed by combining pairwise screened Coulomb interactions and self-terms?
where r _ ij _ is the distance between atoms i and j, and with σ_ i _ and σ_ j _ representing Gaussian widths. The error function, , smoothens the interaction between two atomic charges at a distance of r _ ij _.
Earlier third-generation HDNNPs predict charges directly from local descriptors, limiting their ability to equilibrate charges over extended networks.? Therefore, to include the long-range charge transfer effects, a separate charge equilibration is performed in 4G-HDNNPs. In this scheme, the atomic charges are optimized by minimizing E _ Q _ ^eq^ ?
where χ_ i _ are the learned atomic electronegativities and J _ i _ are the element-specific hardness. The optimization is performed by solving the following equation:
which yields the set of linear equations
The A _ ij _ matrix elements are defined for periodic systems using Ewald summation as?
where k is the reciprocal lattice vector, V is the volume of the periodic simulation cell, and η is a hyperparameter that defines the width of the Gaussian charges in the real-space cutoff within the Ewald summation. Summation over k is performed excluding the zero vector. The first term, or the reciprocal-space sum, represents long-range Coulomb interactions computed in the reciprocal space, and the second, or real-space interaction term, is accounted for by the complementary error function terms (erfc) to ensure proper convergence and efficiency. The real-space contribution is evaluated only for pairs with r _ ij _ < r cut ^real^, where r cut ^real^ is the real-space cutoff.
Similar to the second and third-generation HDNNPs, the short-range energy is formulated as the sum of atomic energies
with E _ i _ representing the energy contribution of atom i predicted by the second neural network that takes the local symmetry functions and the atomic charge as input. To avoid double-counting, the electrostatic energy is subtracted from the reference-calculated total energy before training the second neural network.
In this work, using 4G predicted charges directly and without explicit training of dipoles, IR spectra are calculated from the Fourier transform of the total cell dipole-derivative autocorrelation function, following our previous works, ?,?
where Ṁ is the time derivative of the total dipole moment, and f(t) is a window function that dampens the tail of the autocorrelation function and eliminates the so-called ringing artifacts in the calculated spectra.? Here we use the Hann window?
where τ is a cutoff time chosen as appropriate.
The bulk water simulation box contains 64 water molecules in a cubic box of size 12.42 Å, corresponding to a density of 0.997 g/cm^3^ at 298 K. For the interfacial simulations, 96 water molecules were placed in a 14.21 Å × 14.21 Å × 39.21 Å rectangular prism with a 12.5 Å vacuum layer on each side of the slab along the z-direction. Periodic boundary conditions were used in all three directions. The reference ab initio MD (AIMD) trajectories were generated employing the revPBE0-D3 dispersion corrected hybrid functional ?−? ? ? in CP2K? using the Quickstep module.? Reference atomic charges were calculated using the DDEC6 charge partitions.? Previous studies have shown that NQE-included hybrid revPBE0-D3 simulations produce accurate structures and vibrational spectra for bulk water. ?,?,? The initial configuration for bulk water was generated using Packmol? and equilibrated for 10 ps using classical revPBE-D3 AIMD simulations in the NVT ensemble with a CSVR thermostat and a 1 fs time step. This was followed by 50 ps revPBE0-D3 AIMD simulations, with the snapshots from the final 40 ps used for training.
All 4G-HDCNNP model trainings were performed using n2p2 (see Figure).? The radial and angular symmetry functions for O and H atoms were taken from ref ? and are provided in the Supporting Information Tables S1 and S2. Two hidden layers, each with 15 nodes, were used for both electrostatics and short-range neural networks. The active learning with committee neural network architecture, as implemented in an in-house modified version of AML,? was employed for efficient sampling and accurate model generations. All active learning parameters were extensively benchmarked. It was found that a committee of four models (n = 4) is sufficient for all charge, energy, and force trainings (see the Supporting Information Figure S1). Also, 15 epochs were found to be adequate for both the electrostatics and short-range neural networks. All 4 committee models use the same set of symmetry functions to represent the local environments, but differ in their initial random seed. These variations ensure a sufficiently diverse committee of NNPs. In our active learning workflow, we first selected 20 maximally distant configurations in the first iteration. For all subsequent iterations, 20 configurations with the highest force disagreement among the committee members were selected according to eq,? where n is the committee index and α is the atom index,
Here, ΔE _ i _ = E _ i _ – E̅ is the deviation of the predicted energy of model i from the ensemble mean. The term ∇α ΔE _ i _ corresponds to the difference between the force on atom α predicted by model i and the ensemble-averaged force, and σ_ F α _ represents the force disagreement across the committee. Once the total number of configurations is selected with active learning for a given generation, the final model for that generation is trained with 50 epochs for the electrostatic neural network and 100 epochs for the short-range.
The Generation 1 models were trained on a set of bulk classical configurations with n = 4 committee models in the QbC cycle. For this purpose, a candidate pool of 4000 distant and random structures was harvested from the revPBE0-D3 AIMD simulation. The selection loop ensured that no configuration was sampled more than once, and evaluated the entire candidate pool at each round. The subsequent generations were trained with both bulk and air–water interface configurations. Once Generation 1 models were trained, MD and PIMD simulations were performed using an in-house modified version of our DL_POLY Quantum code. ?,? DL_POLY Quantum is a highly modular, sustainable, and scalable MD simulation platform for long-time, large-scale classical and PI simulations of condensed-phase and interfacial systems, with essential NQEs included. Generation 2 models were trained with the configuration pool generated from DL_POLY Quantum simulations for classical and partially quantum nuclei. At this stage, NQEs were gradually introduced with configurations from partially converged PIMD simulations with 8 beads. Both bulk and interfacial configurations were added to the Generation 2 pool. Similarly, Generation 3 was trained with configurations selected from fully converged PIMD simulations with 32 beads generated from both bulk and air–water interface simulations in DL_POLY Quantum. Similar to ref ?, a biasing potential was applied at every stage during the 4G-HDCNNP MD and PIMD simulations to stabilize the simulations and prevent them from sampling regions of configuration space with large committee disagreements. Energy, force, and atomic charges for the selected configurations from Generations 2 and 3 were calculated using the same revPBE0-D3 hybrid dispersion-corrected functional and DDEC6 charge partition. All results reported here are obtained using the final Generation 3 models, unless stated otherwise.
Using the trained final Generation 3 models, an initial 50 ps canonical (NVT) sampling was performed for bulk water using the PI Langevin equation (PILE) thermostat? with timesteps of 1 and 0.5 fs for classical and PIMD simulations, respectively. After discarding equilibration configurations, 10 independent snapshots were extracted for microcanonical (NVE) simulations of 10 ps each with a time step of 1, 0.5, and 0.25 fs for classical, TRPMD, and PA-CMD simulations, respectively. For each of the NVE trajectories, the initial 2 ps were discarded as equilibration, and the remaining 8 ps trajectory was used for spectral analysis. All dipole moments were calculated using the standard expression for a system of classical point charges: μ = ∑_ i _ Q _ i _ R _ i , where Q _ i _ is the atomic charge associated with atom i and R _ i _ is its position. Similar to ref ?, every water molecule was first reconstructed with minimum-image connectivity. The center of mass for each water molecule was then shifted to the origin, and the x, y, and z components of μ were evaluated. Using these dipoles directly, IR spectra were calculated from the Fourier transform of the total cell dipole-derivative autocorrelation functions with a Hann window cutoff of τ_0 = 0.3 ps. The line shape sensitivity to different time constants is demonstrated for TRPMD simulations in the Supporting Information Figure S8.
To benchmark and evaluate our active learning workflow in Generation 1, a separate validation set consisting of 500 distant AIMD-generated bulk configurations was used. For n = 2, 4, and 8 models, the charge and force errors on that validation set are illustrated in the Supporting Information Figure S1. The MAE values converge quickly with increasing the number of committees and plateau within 300 configurations. As such, Generation 1 was trained with 300 QbC selected configurations. Such choices and error levels are consistent with prior reports on water using active learning and 2G-HDNNP committee models, where a few hundred reference configurations were found to be sufficient for achieving high accuracy.? The charge RMSE for this generation was calculated to be 5.97 me and 5.91 me for the training and test sets, respectively. The energy RMSE is found to be 0.45 meV/atom and 0.41 meV/atom, and the force RMSE is calculated to be 0.05 eV/Å for both the training and test sets. These values are consistent with those reported previously (the comparisons are provided in the Supporting Information Table S3). ?,? Calculated correlations between the reference and 4G-HDCNNP predicted charges, energies, and forces for the training and test sets are given in the Supporting Information Figure S2.
Figure illustrates the calculated MAEs across all three generations. This evaluation was performed on another separate validation set of 2000 configurations from four different trajectories simulated with the final Generation 3 models. The first set of configurations included 500 distinct classical bulk simulation snapshots. The following 500 configurations were generated from a converged 32-bead PIMD simulation of bulk water. The third and fourth sets of 500 configurations each originated from similar classical and PIMD simulations of interfacial water. All reference energy, force, and DDEC6 charges were calculated at the revPBE0-D3 level. As can be seen from Figure, the classical force MAEs are calculated to be 49 meV/Å, 44 meV/Å, and 44 meV/Å for Generation 1–3 models, respectively. For configurations with quantum nuclei, the MAEs are 86 meV/Å, 55 meV/Å, and 53 meV/Å for Generation 1–3 models, respectively. Expectedly, the first-generation models trained with only classical configurations perform poorly in describing both bulk and interfacial configurations with quantum nuclei, which improves with subsequent generations as NQEs are introduced during training. All MD and PIMD production simulations using the final trained 4G-HDCNNPs were performed with n = 4 committee models for Generations 1 and 2. To reduce the computational cost of our simulations without compromising accuracy, for Generation 3, we reduced the number of models to n = 2, as it was found to yield similar results while being comparatively faster, especially for path integral simulations involving air–water interface systems. This enabled us to perform longer trajectories with no loss of accuracy. The comparison among n = 1, 2, 3, and 4 models, validated using bulk water radial distribution functions (RDFs) calculated with Generation 3 models, is presented in the Supporting Information Figure S3.
To assess the accuracy of our vibrational spectral simulations, we also evaluate electronic observables, notably dipole-moment fluctuations, alongside energies and forces. We have therefore adopted dipole moments as part of our evaluation criteria. Figure illustrates dipole moment correlations for the same classical bulk subset of the 2000 configuration validation set described above. Dipole moment validations for interface and PIMD sets are reported in the Supporting Information Figures S4–S6. We obtained dipole moment MAEs of 0.03 D for the classical configurations and 0.04 D for the PIMD configurations across both bulk and interfacial validation sets. These validations demonstrate that the 4G-HDCNNP predicted charges can recover the dipole moment to within a few hundredths of a Debye, proving the accuracy of our employed approach in calculating dipoles. Using 4G-HDCNNPs, therefore, provides a robust and practical paradigm for predictive spectral simulations, eliminating the need for explicit training of dipoles, ad-hoc fittings, empirical parametrizations, or charge rescaling to match reference spectra.
Figure compares the calculated RDFs for bulk water using the final Generation 3 models compared to reference revPBE0-D3. 4G-HDCNNP-generated RDFs from classical MD simulations are in excellent agreement with the reference classical AIMD calculated RDFs for O _ W _–O _ W _, O _ W _–H _ W _, and H _ W _–H _ W _ pairs. This demonstrates that our trained 4G-HDCNNPs can accurately capture the dynamic H-bond network of water at finite temperatures. To evaluate finite-size effects, classical MD simulations were also performed for a larger box of 17.53 Å containing 180 water molecules, corresponding to the same density of 0.997 g/cm^3^ at room temperature, where negligible differences were found (see the Supporting Information Figure S7).
To highlight the importance of nonlocal electrostatics, another set of 2G-HDCNNP models was trained using the same data set. The calculated RDFs for 2G-HDCNNPs are also shown in Figure. Compared to 4G-HDCNNPs and the reference, the 2G-HDCNNP calculated O _ W _–O _ W _ peak for the first solvation shell is slightly shifted to a longer distance, and the second peak corresponding to the second solvation shell is also flattened. This illustrates an underestimation of the H-bond strength from 2G models, attributable to the neglect of crucial long-range charge-transfer effects. This comparison, therefore, underscores the necessity of explicitly including these nonlocal effects in accurate simulations of the structural and dynamical properties of condensed-phase and interfacial systems.
The simulated IR spectra of liquid bulk water at 298 K obtained from classical, PA-CMD, and TRPMD simulations using the final trained 4G-HDCNNPs are presented in Figure. The experimental IR peak for the O–H stretch region appears at 3420 cm^–1^. In classical simulations, the O–H stretch exhibits a pronounced blue shift of 274 cm^–1^ relative to the experiment due to the neglect of NQEs. The blue shift for the classical spectra also appears in the bending region, although to a much lesser extent. The inclusion of NQEs through real-time path integral methods, TRPMD, and PA-CMD, largely corrects this deviation. TRPMD generates a stretch peak with a modest blue shift of 24 cm^–1^ and small broadening effects. The PA-CMD produces a stretch maximum with a slight red shift of 26 cm^–1^. The artificial broadening and red shift of the O–H stretch peak of bulk water at room temperature, as obtained from TRPMD and PA-CMD, respectively, are in agreement with the known problems associated with these methods, as summarized in our previous works. ?,?,? All three methods more or less agree on the positions and relative intensities of the peaks for the librational mode, indicating that the 4G models accurately capture the dipole derivatives responsible for low-frequency absorption and can reliably predict underlying vibrational physics.
Comparison between the classical IR spectra generated from the 4G-HDCNNP and 2G-HDCNNP models is shown in Supporting Information Figure S9. The approximate IR spectra obtained from the 2G-HDCNNP models were calculated using the same procedure, except that fixed charges from the SPC? and TIP4P/2005? force fields were employed. The intensity of the O–H stretch peak is found to be lower than that of the bend mode for both 2G calculated spectra, compared to 4G and the experiment. This reduced O–H stretch intensity is consistent with what is commonly observed from fixed point charge force fields such as SPC and TIP4P/2005. In comparison, 4G-HDCNNPs incorporate long-range electrostatics and provide environment-dependent charges, which enhance the accuracy of dipole fluctuations that dictate IR intensities. Both 2G-HDCNNP calculated O–H stretch peaks exhibit a blue shift of approximately 70 cm^–1^ relative to the 4G-HDCNNP calculated peak. While this shift might not be significant for the considered neutral bulk water, in redox systems involving ions with significant charge transfer interactions, the absence of nonlocal electrostatic effects in 2G models is expected to lead to much more substantial uncertainties and deviations in peak positions and lineshapes compared to the corresponding 4G spectra and the reference.
For the IR spectra of the air–water interface, using the water density profile along the z-direction, the simulation box was partitioned into two layers, namely Layers 1 and 2 (Figure). Layer 1 represents the bulk-like water, which sits close to the center of the box (z = 0). The thickness of the bulk-like water layer was kept as ≈8.00 Å. Layer 2 on each side represents the air–water interface with a thickness of ≈4.00 Å each. The water density is highest and bulk-like near the center and gradually decreases toward the vapor phase. Our Layer-resolved IR spectra calculated using classical, TRPMD, and PA-CMD methods for the air–water interface at 298 K are presented in Figure (the deconvoluted spectra are provided in the Supporting Information Figure S10). For each method, the total spectra are accompanied by the spectra from Layers 1 and 2 corresponding to bulk-like and interface-like water molecules, respectively. In the classical simulations, the total spectra exhibit two well-separated peaks. The bulk-like band centered around 3700 cm^–1^ is dominated by the Layer-1 water molecules, which reside at the center of the box (Figure). The high frequency peak around 3944 cm^–1^ arises from Layer-2, which is mostly comprised of interfacial water molecules with abundant single H-bond donors and dangling free OH bonds. Similar to the bulk spectra, all classical peaks are blue-shifted due to the neglect of the NQEs. The O–H stretch peaks are red-shifted once the NQEs are added through TRPMD and PA-CMD. The TRPMD peak is artificially broadened, hiding the distinct doublet feature characteristic of interfacial systems. On the other hand, the PA-CMD spectrum is expectedly red-shifted and slightly broadened, though to a lesser extent than in TRPMD due to its well-known curvature problem. However, the total PA-CMD peak still exhibits a doublet feature, though less pronounced than the classical spectrum.
Direct comparison with experimental linear IR spectra for the air–water interface is challenging due to the difficulty of isolating the interfacial signal in linear absorption measurements. Experimental probing of interfacial water typically relies on surface-specific nonlinear techniques such as vSFG, which for the air–water interface reveals a free O–H bond at ≈3700 cm^–1^, distinct from the hydrogen-bonded bulk-like peak appearing around 3600 cm^–1^.? The high-frequency peak observed in our Layer 2 spectra corresponds to these free dangling O–H bonds at the interface. While linear IR and vSFG selection rules differ, the presence of this distinct high-frequency mode in our simulated spectra confirms that the 4G-HDCNNP model correctly captures the characteristic structural heterogeneity of the interfaces.
In summary, this work demonstrates that 4G-HDCNNPs efficiently trained using active learning and query-by-committee can be utilized for predictive quantum vibrational spectral simulations without the need for explicit training of dipole moments, ad hoc fitting, or parametrizations. This feat is achieved by combining nonlocal effects from 4G-HDCNNPs with nuclear quantum effects from real-time path integral simulations. Analyses of IR spectra for bulk and air–water interface test cases demonstrate that the 4G-HDCNNP predicted charges are reliable for dipole and IR spectral calculations. The adapted methodology in this work is general and applicable to other complex systems and environments with significant long-range charge transfer effects. Future works will focus on (i) calculating spectra through combining 4G-HDCNNPs with recently introduced curvature-free CMD-derived real-time methods, including Te-PIGS,? f-QCMD, ?,? and our very own h-CMD? methods, (ii) applying 4G-HDCNNPs to complex reactive electrochemical environments at different condensed phases and interfaces dominated by nonlocal charge transfer interactions, and (iii) including constant potentials for the accurate simulations of different dynamical and spectral properties in the presence of external electric fields. We hope that these developments will facilitate the accurate calculation of quantum dynamical properties and vibrational spectra, addressing important and long-standing questions at the heart of energy conversion and storage processes, among others.
Supplementary Material
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Perakis F.De Marco L.Shalit A.Tang F.Kann Z. R.Kühne T. D.Torre R.Bonn M.Nagata Y.Vibrational Spectroscopy and Dynamics of Water Chem. Rev.20161167590760710.1021/acs.chemrev.5b 0064027096701 · doi ↗ · pubmed ↗
- 2Sharma M.Donadio D.Schwegler E.Galli G.Probing Properties of Water under Confinement: Infrared Spectra Nano Lett.200882959296210.1021/nl 801864318680386 · doi ↗ · pubmed ↗
- 3Le Caër S.Pin S.Esnouf S.Raffy Q.Renault J. P.Brubach J.-B.Creff G.Roy P.A trapped water network in nanoporous material: the role of interfaces Phys. Chem. Chem. Phys.201113176581766610.1039/c 1cp 21980 d 21909507 · doi ↗ · pubmed ↗
- 4Catafesta J.Alabarse F.Levelut C.Isambert A.Hébert P.Kohara S.Maurin D.Bantignies J.-L.Cambon O.Creff G.Roy P.Brubach J.-B.Hammouda T.Andrault D.Haines J.Confined H 2O molecules as local probes of pressure-induced amorphisation in faujasite Phys. Chem. Chem. Phys.201416122021220810.1039/c 4cp 00186 a 24816994 · doi ↗ · pubmed ↗
- 5Ceriotti M.Fang W.Kusalik P. G.Mc Kenzie R. H.Michaelides A.Morales M. A.Markland T. E.Nuclear Quantum Effects in Water and Aqueous Systems: Experiment, Theory, and Current Challenges Chem. Rev.20161167529755010.1021/acs.chemrev.5b 0067427049513 · doi ↗ · pubmed ↗
- 6Markland T. E.Ceriotti M.Nuclear Quantum Effects Enter the Mainstream Nat. Rev. Chem.20182010910.1038/s 41570-017-0109 · doi ↗
- 7Feynman, R. P. ; Hibbs, A. R. ; Styer, D. F. Quantum Mechanics and Path Integrals; Courier Corporation, 2010.
- 8Parrinello M.Rahman A.Study of an F Center in Molten K Cl J. Chem. Phys.19848086086710.1063/1.446740 · doi ↗
