Vibrational Investigation of Nucleobases by Means of Divide and Conquer Semiclassical Dynamics
Fabio Gabas, Giovanni Di Liberto, and Michele Ceotto

TL;DR
This study uses a divide-and-conquer semiclassical method to accurately analyze vibrational features of nucleobases, achieving results within 20 wavenumbers of experimental data, even for complex conformers.
Contribution
It introduces a computational approach that effectively models vibrational spectra of nucleobases with high accuracy, applicable to complex molecular conformations.
Findings
Accuracy within 20 wavenumbers of experimental data
Effective for complex conformers like cytosine
Promising for future studies on nucleotides and base pairs
Abstract
In this work we report a computational study of the vibrational features of four different nucleobases employing the divide-and-conquer semiclassical initial value representation molecular dynamics method. Calculations are performed on uracil, cytosine, thymine, and adenine. Results show that the overall accuracy with respect to experiments is within 20 wavenumbers, regardless of the dimensionality of the nucleobase. Vibrational estimates are accurate even in the complex case of cytosine, where two relevant conformers are taken into account. These results are promising in the perspective of future studies on more complex systems, such as nucleotides or nucleobase pairs.
| Label | Exp | Harmonic | Harmonic/Anharmonic | VPT2 | Semiclassical | Label | Exp | Harmonic | Harmonic/Anharmonic | VPT2 | Semiclassical | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| B3LYP/aug-cc-pvdz | CCSD(T)/B3LYP30 | GVPT2 | CVPT2 | MC SCIVR | DC SCIVR | B3LYP/aug-cc-pvdz | CCSD(T)/B3LYP30 | GVPT2 | CVPT2 | MC SCIVR | DC SCIVR | |||||
| 1 | 168 | 140 | 132 | 139.7 | 156 | 160 | 17 | 1075 | 1083 | 1084 | 1061 | 1060.8 | 1052 | 1060 | ||
| 2 | 185 | 181 | 159 | 155 | 157.9 | 164 | 170 | 18 | 1185 | 1194 | 1205 | 1176 | 1179.9 | 1104 | 1200 | |
| 3 | 391 | 395 | 387 | 386 | 385.6 | 380 | 370 | 19 | 1217 | 1222 | 1248 | 1221 | 1214.4 | 1208 | 1200 | |
| 4 | 411 | 405 | 388 | 384 | 384.4 | 396 | 390 | 20 | 1359 | 1379 | 1394 | 1384 | 1382.0 | 1336 | 1340 | |
| 5 | 516 | 522 | 517 | 510 | 510.8 | 524 | 510 | 21 | 1389 | 1402 | 1414 | 1355 | 1351.0 | 1340 | 1340 | |
| 6 | 537 | 543 | 541 | 530 | 531.1 | 520 | 535 | 22 | 1400 | 1412 | 1427 | 1388 | 1394.1 | 1376 | 1370 | |
| 7 | 562 | 560 | 545 | 549 | 535.3 | 552 | 550 | 23 | 1472 | 1497 | 1505 | 1466 | 1462.7 | 1464 | 1460 | |
| 8 | 551 | 582 | 559 | 555 | 549.4 | 544 | 560 | 24 | 1643 | 1673 | 1678 | 1643 | 1642.8 | 1644 | 1630 | |
| 9 | 662 | 698 | 670 | 654 | 651.4 | 676 | 680 | 25 | 1706 | 1757 | 1762 | 1733 | 1729.5 | 1720 | 1720 | |
| 10 | 718 | 729 | 728 | 711 | 715.8 | 708 | 710 | 26 | 1764 | 1788 | 1790 | 1761 | 1761.2 | 1764 | 1760 | |
| 11 | 757 | 764 | 765 | 746 | 756.1 | 680 | 750 | 27 | 3210 | 3218 | 3072 | 3081.0 | 2980 | 3070 | ||
| 12 | 759 | 771 | 773 | 751 | 751.8 | 748 | 750 | 28 | 3250 | 3253 | 3117 | 3133.6 | 3144 | 3160 | ||
| 13 | 804 | 821 | 814 | 793 | 803.2 | 788 | 820 | 29 | 3435 | 3585 | 3602 | 3436 | 3435.2 | 3480 | 3510 | |
| 14 | 987 | 963 | 973 | 954 | 955.9 | 940 | 940 | 30 | 3485 | 3631 | 3653 | 3485 | 3486.3 | 3536 | 3550 | |
| 15 | 958 | 966 | 968 | 940 | 947.5 | 952 | 950 | MAE | 26 | 30 | 10 | 9 | 22 | 20 | ||
| 16 | 980 | 988 | 995 | 978 | 979.9 | 968 | 970 | |||||||||
| Label | Exp/Ne | Exp/Ar | VPT2 | Harmonic | DC SCIVR | Label | Exp/Ne | Exp/Ar | VPT2 | Harmonic | DC SCIVR | |
| 8 | 546 | 520 | 21 | 1237 | 1244 | 1227 | 1258 | 1220 | ||||
| 9 | 571 | 575 | 578 | 560 | 22 | 1340 | 1337 | 1335 | 1351 | 1320 | ||
| 10 | 614 | 614 | 643 | 630 | 23 | 1423 | 1423 | 1408 | 1441 | 1420 | ||
| 11 | 717 | 716 | 726 | 710 | 24 | 1475 | 1475 | 1472 | 1497 | 1480 | ||
| 12 | 749 | 747 | 768 | 750 | 25 | 1540 | 1539 | 1521 | 1566 | 1550 | ||
| 13 | 767 | 747 | 771 | 760 | 26 | 1602 | 1598 | 1588 | 1629 | 1590 | ||
| 14 | 784 | 818 | 786 | 780 | 27 | 1659 | 1656 | 1647 | 1685 | 1660 | ||
| 15 | 921 | 890 | 28 | 1730 | 1733 | 1736 | 1755 | 1750 | ||||
| 16 | 964 | 940 | 29 | 3037 | 3202 | 3140 | ||||||
| 17 | 984 | 980 | 30 | 3092 | 3227 | 3120 | ||||||
| 18 | 1088 | 1082 | 1060 | 31 | 3457 | 3441 | 3460 | 3596 | 3510 | |||
| 19 | 1103 | 1090 | 1106 | 1120 | 1110 | 32 | 3474 | 3472 | 3477 | 3611 | 3510 | |
| 20 | 1198 | 1196 | 1193 | 1208 | 1190 | 33 | 3575 | 3564 | 3610 | 3741 | 3580 | |
| MAE | 10(13) | 38(41) | 13(18) |
| Label | Exp/Ne | Exp/Ar | Harmonic | DC SCIVR | Label | Exp/Ne | Exp/Ar | Harmonic | DC SCIVR | |
|---|---|---|---|---|---|---|---|---|---|---|
| 8 | 553 | 557 | 560 | 540 | 21 | 1258 | 1257 | 1298 | 1270 | |
| 9 | 525 | 520 | 563 | 540 | 22 | 1338 | 1333 | 1342 | 1330 | |
| 10 | 600 | 601 | 602 | 570 | 23 | 1382 | 1379 | 1399 | 1380 | |
| 11 | 711 | 710 | 721 | 730 | 24 | 1441 | 1439 | 1459 | 1440 | |
| 12 | 784 | 751 | 790 | 790 | 25 | 1495 | 1496 | 1517 | 1490 | |
| 13 | 796 | 781 | 794 | 770 | 26 | 1576 | 1575 | 1606 | 1570 | |
| 14 | 809 | 807 | 818 | 830 | 27 | 1592 | 1589 | 1630 | 1610 | |
| 15 | 948 | 955 | 986 | 970 | 28 | 1625 | 1622 | 1656 | 1640 | |
| 16 | 948 | 955 | 993 | 970 | 29 | 3165 | 3080 | |||
| 17 | 989 | 980 | 997 | 980 | 30 | 3207 | 3090 | |||
| 18 | 1085 | 1083 | 1092 | 1090 | 31 | 3461 | 3446 | 3596 | 3480 | |
| 19 | 1113 | 1110 | 1120 | 1100 | 32 | 3575 | 3564 | 3736 | 3600 | |
| 20 | 1198 | 1196 | 1241 | 1220 | 33 | 3618 | 3592 | 3770 | 3670 | |
| MAE | 41(36) | 16(18) |
| mode | Gas | Gas/Ar | Harmonic | DC SCIVR | mode | Gas | Gas/Ar | Harmonic | DC SCIVR | |
| 1 | 130 | 110 | 21 | 1078 | 1087 | 1153 | 1120 | |||
| 2 | 162 | 150 | 22 | 1178 | 1184 | 1196 | 1170 | |||
| 3 | 168 | 150 | 23 | 1221 | 1232 | 1190 | ||||
| 4 | 280 | 281 | 290 | 24 | 1357 | 1367 | 1380 | |||
| 5 | 302 | 310 | 25 | 1395 | 1397 | 1360 | ||||
| 6 | 391 | 394 | 380 | 26 | 1393 | 1389 | 1408 | 1390 | ||
| 7 | 394 | 407 | 390 | 27 | 1409 | 1405 | 1419 | 1380 | ||
| 8 | 462 | 455 | 461 | 460 | 28 | 1437 | 1448 | 1430 | ||
| 9 | 540 | 546 | 530 | 29 | 1451 | 1473 | 1460 | |||
| 10 | 541 | 545 | 583 | 570 | 30 | 1463 | 1472 | 1497 | 1460 | |
| 11 | 603 | 600 | 31 | 1668 | 1684 | 1700 | 1690 | |||
| 12 | 689 | 662 | 700 | 670 | 32 | 1725 | 1712 | 1741 | 1750 | |
| 13 | 727 | 735 | 710 | 33 | 1772 | 1768 | 1784 | 1760 | ||
| 14 | 755 | 754 | 757 | 750 | 34 | 2855 | 3038 | 2955 | ||
| 15 | 767 | 764 | 779 | 770 | 35 | 2941 | 2940 | 3097 | 2970 | |
| 16 | 804 | 800 | 802 | 780 | 36 | 2984 | 2971 | 3121 | 2990 | |
| 17 | 885 | 890 | 911 | 900 | 37 | 3076 | 2992 | 3202 | 3080 | |
| 18 | 963 | 959 | 962 | 950 | 38 | 3437 | 3434 | 3588 | 3470 | |
| 19 | 1005 | 1018 | 1000 | 39 | 3484 | 3485 | 3632 | 3530 | ||
| 20 | 1031 | 1046 | 1060 | 1040 | MAE | 48(38) | 17 (21) |
| mode | Gas | Gas/Ar | PT2 | Harmonic | DC SCIVR | mode | Gas | Gas/Ar | PT2 | Harmonic | DC SCIVR | |
| 1 | 162 | 139 | 168 | 160 | 21 | 1065 | 1061 | 1054 | 1076 | 1050 | ||
| 2 | 214 | 181 | 210 | 210 | 22 | 1126 | 1127 | 1125 | 1141 | 1120 | ||
| 3 | 244 | 242 | 209 | 267 | 220 | 23 | 1234 | 1229 | 1230 | 1242 | 1220 | |
| 4 | 270 | 276 | 276 | 279 | 250 | 24 | 1246 | 1243 | 1263 | 1250 | ||
| 5 | 299 | 302 | 290 | 25 | 1280 | 1290 | 1291 | 1332 | 1300 | |||
| 6 | 506 | 503 | 491 | 512 | 520 | 26 | 1326 | 1328 | 1325 | 1356 | 1330 | |
| 7 | 515 | 513 | 516 | 529 | 500 | 27 | 1346 | 1345 | 1338 | 1365 | 1330 | |
| 8 | 518 | 531 | 490 | 28 | 1389 | 1376 | 1414 | 1390 | ||||
| 9 | 529 | 542 | 500 | 29 | 1415 | 1419 | 1406 | 1432 | 1400 | |||
| 10 | 563 | 566 | 570 | 583 | 570 | 30 | 1468 | 1474 | 1466 | 1498 | 1485 | |
| 11 | 600 | 610 | 610 | 617 | 610 | 31 | 1482 | 1481 | 1512 | 1500 | ||
| 12 | 650 | 655 | 655 | 668 | 660 | 32 | 1577 | 1607 | 1580 | |||
| 13 | 677 | 688 | 670 | 33 | 1612 | 1591 | 1635 | 1590 | ||||
| 14 | 717 | 717 | 723 | 710 | 34 | 1625 | 1633 | 1616 | 1660 | 1630 | ||
| 15 | 801 | 802 | 811 | 815 | 800 | 35 | 3041 | 3049 | 3172 | 3070 | ||
| 16 | 847 | 848 | 846 | 858 | 840 | 36 | 3061 | 3057 | 3102 | 3245 | 3160 | |
| 17 | 887 | 885 | 896 | 880 | 37 | 3434 | 3441 | 3432 | 3588 | 3460 | ||
| 18 | 926 | 927 | 931 | 942 | 930 | 38 | 3501 | 3502 | 3497 | 3641 | 3530 | |
| 19 | 957 | 958 | 969 | 979 | 960 | 39 | 3552 | 3555 | 3539 | 3727 | 3570 | |
| 20 | 1005 | 1018 | 1015 | 990 | MAE | 10 (9) | 42 (38) | 16 (14) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Vibrational Investigation of Nucleobases by Means of Divide and Conquer
Semiclassical Dynamics
Fabio Gabas
Dipartimento di Chimica, Università degli Studi di Milano, via C. Golgi 19, 20133 Milano, Italy
Giovanni Di Liberto
Dipartimento di Chimica, Università degli Studi di Milano, via C. Golgi 19, 20133 Milano, Italy
Michele Ceotto
Dipartimento di Chimica, Università degli Studi di Milano, via C. Golgi 19, 20133 Milano, Italy
Abstract
In this work we report a computational study of the vibrational features of four different nucleobases employing the divide-and-conquer semiclassical initial value representation molecular dynamics method. Calculations are performed on uracil, cytosine, thymine, and adenine. Results show that the overall accuracy with respect to experiments is within 20 wavenumbers, regardless of the dimensionality of the nucleobase. Vibrational estimates are accurate even in the complex case of cytosine, where two relevant conformers are taken into account. These results are promising in the perspective of future studies on more complex systems, such as nucleotides or nucleobase pairs.
I Introduction
The nucleobases adenine (A), cytosine (C), guanine (G), thymine (T) and uracil (U) represent the inner part of nucleotides, that are the building blocks of the nucleic acids, DNA and RNA. More specifically, a nucleotide chain constitutes a single strand that interacts with another strand through the bases. These strands compose the secondary and tertiary structures of the nucleic acids, as for example the famous double helix discovered by Watson and Crick.1 The interaction between nucleobases follows a specific pairing: A and G, classified as purines, interact respectively with T (for DNA) or U (for RNA) and C, called instead pyrimidines. The nucleobases structure is particularly relevant since it is directly linked to their functionality.2 When some of them exist in tautomeric forms, only one conformation is predominant in nature. Also, different tautomers can bring to mispairings between pyrimidines and purines, leading to phenomena known as mutagenesis.1; 3; 4; 2; 5; 6 For these reasons, an accurate investigation of the structure and properties of all the nucleobase conformations is particularly important. Clearly, detailed studies in condensed phase are the final goal to understand these properties.7; 8; 9; 10; 11; 12; 13 However, a preliminary investigation in vacuum is a mandatory step in order to have a spectroscopic clear picture and to be able to discriminate between more complex condensed phase interactions. Specifically, in the gas phase,14 it is possible to study the intrinsic properties of such molecules without any intermolecular interaction. Besides, the deep knowledge of biomolecules and their ionic behavior in vacuum is important also for a better understanding and designing of various spectroscopy techniques, such as mass and vibrational spectroscopy. Thus, gas-phase vibrational spectroscopy can certainly help in the characterization and fundamental understanding of nucleobases.
Over the past decades a lot of experimental work has been done in this direction, starting from the measurement of accurate spectra for adenine, thymine and uracil,15; 16; 17; 18; 19; 20; 21; 22 and going to the more complex spectra of cytosine and guanine, which show more than a single relevant isomer, due to the facile tautomerism.23; 24; 25; 26; 27; 28; 29 These experimental spectra have been often combined with theoretical calculations for a better interpretation. However, given the molecular size of these systems, in most cases the theoretical prediction was provided by harmonic frequencies calculated at the equilibrium geometry. Even if this approach is computationally cheap and relatively immediate to apply, it neglects all possible resonances and anharmonicities of the potential. Recently, very detailed vibrational studies of uracil have been presented using the canonical van Vleck second-order vibrational perturbation theory (CVPT2), the fully automated generalized second-order vibrational perturbation (GVPT2) approach, and the Hierarchical Intertwined Reduced-Rank Block Power Method (HI-RRBPM), obtaining very accurate results compared to the experiment.30; 31; 32 Also, Perturbation Theory (PT2) has been successful for adenine,33 and for the oxo isomer of citosine.34 However, at the best of our knowledge, in the case of thymine a complete anharmonic computational spectrum is still missing.
In this work we present a vibrational spectroscopic study of four nucleobases together with their eventual principal isomers, by means of the semiclassical initial value representation (SCIVR) molecular dynamics.35; 36; 37; 38; 39; 40; 41; 42; 43; 44; 45; 46; 47; 48; 49 SCIVR has been proved in recent years to be very powerful for spectroscopic calculations.50; 51; 52; 53; 54; 55; 56; 57; 58; 59; 60; 61; 62; 63; 64 It does not suffer from Zero Point Energy Leakage (ZPEL)65 and it can be employed for vibrational eigenfunction calculations.66; 67; 68 In particular, the divide-and-conquer semiclassical initial value representation (DC SCIVR) method for vibrational spectroscopy has been employed reliably in several applications, allowing to obtain semiclassical vibrational spectra of variously sized molecules without any relevant loss of accuracy, and with an average deviation of 20 wavenumbers from either exact or experimental results.69; 70 Specifically, DC SCIVR has been applied to study challenging high-dimensional and complex systems such as fullerene, supramolecular glycine-based molecules and water clusters.71; 72; 69
In the present work we calculate power spectra by performing ab-initio molecular dynamics simulations, which has been widely and successfully previously employed for full-dimensional on-the-fly semiclassical applications.73; 53; 55; 54; 72 More specifically, we want to provide not only an extensive vibrational study of nucleobases, but also to open the route to ab initio DC SCIVR calculations on DNA-related molecules. For these goals, our quantum mechanical vibrational estimates are compared with both experiments and other theoretical calculations. These calculations will check and prove DC SCIVR method feasibility and reliability, and provide the confidence for future calculations of increasing dimensionality up to pairs of bases and higher dimensional sequences.74; 75; 76; 77 Here, we investigate four of the five nucleobases, i.e. uracil, cytosine, thymine, and adenine. Since gas-phase spectra of the remaining nucleobase, guanine, are given by a combination of its four most stable conformers and their signals are all within few wavenumbers, guanine is not a good benchmark for testing the accuracy of our method and it has not been investigated.27; 78; 25
The paper is organized as follows. Section (II) recalls the DC SCIVR method for vibrational spectra calculations, together with the computational setup. In Section (III) we present our results about the nucleobases and we compare them to different experiments and other theoretical calculations. Finally, Section (IV) provides some conclusions and future perspectives.
II Divide and Conquer Semiclassical Dynamics
Vibrational power spectra are obtained via Fourier transform of the semiclassical approximation to the survival amplitude
[TABLE]
where is a given reference state, is the Hamiltonian operator and is the quantum mechanical time-evolution operator. We calculate this propagator using the semiclassical approximation.
Semiclassical theory takes advantage of the Feynman path integral representation of the quantum propagator,79 in which a quantum mechanical amplitude, describing the probability for a particle of mass to move from a certain initial state to a final one at time , is calculated as
[TABLE]
where the integral of the differential is given by the summation over all possible paths from to . is the action of each path and is the number of degrees of freedom. When the stationary phase approximation for the path integration is enforced, only classical paths contribute. This approximation leads to the well known van Vleck propagator,80 where the integral is replaced by a sum over all classical paths connecting the starting position to the ending one in an amount of time
[TABLE]
The index is called Morse, or Maslov index, and it accounts for the sign change of the determinant \Biggl{|}\frac{\partial\mathbf{p}\left(0\right)}{\partial\mathbf{q}\left(t\right)}\Biggr{|}. This version of the propagator is quite cumbersome and not practical, since it requires to deal with a root search problem with fixed boundary conditions. This issue has been overcome by the Initial Value Representation trick proposed by Miller,37 where the integrand of Eq.(1) is written as
[TABLE]
In this version of the propagator, the survival amplitude can be obtained by sampling trajectories with different initial conditions and it is amenable to Monte Carlo integration. Later, Heller proposed a very flexible and numerically stable representation of the semiclassical propagator based on coherent states81; 82 of the type
[TABLE]
where, in our simulations, the matrix is chosen to be constant in time and diagonal, with elements equal to the normal mode vibrational frequencies. The wavepacket in Eq.(5) is centered at the classical position and momentum , and it follows the classical trajectory. The expression of the quantum propagator in this representation is the well known Heller-Herman-Kluk-Kay (HHKK) propagator,83; 84; 85
[TABLE]
where is the pre-exponential factor and it is equal to
[TABLE]
and where are the monodromy (or stability) matrix elements defined as , where is calculated at time t and is calculated at time zero.86; 87 The Monte Carlo integration in Eq. (6) usually needs a high number of trajectories for convergence. For systems having more than a few degrees of freedom some filtering techniques have been proposed to speed up the convergence.88; 89; 90; 91 A well established procedure is the time-averaging (TA) filtering, in which the semiclassical integrand can be worked out to be positively definite92 by taking advantage of the so-called separable approximation of the pre-exponential factor, where only the phase is taken into account, i.e. , and \varphi_{t}=\text{phase\left[C_{t}\left(\mathbf{q}\left(0\right),\mathbf{p}\left(0\right)\right)\right]}. Using the propagator in Eq.(6), the time-averaging filter and the separable approximation, one obtains the following formula for the spectral density92
[TABLE]
Eq. (8) has been employed to perform vibrational estimates of small-sized molecules requiring roughly a thousand of classical trajectories per degree of freedom to converge.93; 94; 95; 96 Unfortunately when pre-fitted Potential Energy Surfaces (PES) are not available, the number of required trajectories is still too computational demanding for on-the-fly or direct dynamics approaches. In this event, only few classical trajectories can be afforded. For these reasons in recent years, starting from Eq. (8), in our group the Multiple Coherent State approach (MC SCIVR) has been developed,97; 54; 55; 53 in which by properly choosing the initial conditions of the classical trajectories and the reference state it is possible to regain spectra with few or even one classical trajectory, and retaining the typical semiclassical accuracy of roughly 20 cm. In the single trajectory implementation, the initial conditions are chosen as , where and stand for the coordinates of the reference coherent state . The phase space integral of Eq.(8) reduces to a single trajectory formulation for each spectroscopic peak of the type
[TABLE]
where is usually chosen to be equal to the equilibrium configuration and is set in order to provide a total initial kinetic energy equal to the harmonic zero point energy (ZPE), which is distributed as among mass-scaled vibrational normal modes, where is the harmonic frequency of the -th mode. and are the respective time-evolved quantities. Eq.(9) is our working equation for MC SCIVR calculation. Moreover, in the MC SCIVR approach, the reference state can be tailored to decompose the spectrum mode by mode. For an -th mode, we have
[TABLE]
where the index runs over the number of vibrational degrees of freedom F.54; 97 If the vector is set equal to one, then the zero point energy peak is obtained (together with other even peaks), while by setting only for a certain -th mode, then the -th fundamental excitation will be enhanced, together with the other odd overtones of -th mode.97 This tool is particularly helpful in presence of crowded spectra, where peaks are very close in energy, as it commonly happens by increasing the dimensionality of the molecule under exam. This strategy successfully allowed to recover accurate spectra of molecules as complex as glycine. 73
Unfortunately, MC SCIVR runs out of steam when the system dimensionality overcomes 25-30 degrees of freedom, because of the so-called curse of dimensionality problem. Recently, we have addressed this issue by proposing the DC SCIVR method,69; 70 where full dimensional classical simulations are projected onto sub-dimensional spaces for semiclassical sub-dimensional spectroscopic calculations. The same working formula of Eq.(8) is employed, but formulated in terms of subspace coordinates. More specifically, the spectral density for a M-dimensional subspace is
[TABLE]
where denotes the projected quantities. The full-dimensional spectrum is recovered as a combination of reduced dimensionality ones. are the projected coherent states written as direct product of monodimensional ones and involving only the degrees of freedom in the subspace. is the projected action, and is the phase of the projected pre-exponential factor. This latter term is obtained from the reduced dimensionality monodromy matrix elements upon a singular value decomposition of the full-dimensional ones.98; 99 The projected action functional is written as
[TABLE]
where the kinetic part is trivially projected since it is naturally separable. For the potential part, we assume that the projected potential depend on the degrees of freedom contained in the subspace and the remaining ones are downgraded to parameters. We include a time-dependent external scalar field to ensure that the equation of the projected potential is exact in the limit of a separable system and it accounts for the contribution arising from the degrees of freedom not belonging to the subspace
[TABLE]
[TABLE]
Clearly, the definition of the subspaces is a critical issue within this approach, since it is responsible for the accuracy of the action and the monodromy matrix calculation. As for Eq.(9), Eq.(11) can be reduced to a single trajectory formulation and this formulation will be employed in our calculations. Next, one desires to group in the same subspace the most interacting degrees of freedom. We described several strategies in Ref.70. Here, we employ the one based on the Hessian matrix, since the off-diagonal terms indicate the level of coupling between degrees of freedom. The procedure to separate the full-dimensional space starts by defining a time-averaged Hessian matrix along a test trajectory, where .69; 70 We fix a threshold parameter that is comparable with the normal mode mass-scaled off-diagonal terms of . If , then the level of coupling is considered significant and the degrees of freedom are enrolled in the same subspace. Instead, if , and modes are on different subspaces, unless there exists a third mode such that and/or . The best possible scenario is when there are few subspaces and they are big enough to retain most of the couplings. However, in practice, the threshold choice is often a trade-off between accuracy and feasibility of the semiclassical calculation.
Molecular dynamics simulations are in this work performed using the NWChem package100 at a DFT B3LYP/aug-cc-pvdz level of theory,101 which is a typical setup for semiclassical ab-initio calculations and represents a good trade-off between accuracy and computational overhead. For each conformer we evolve a single trajectory for a total of 25000 atomic time units, starting from , where is the equilibrium configuration and is the momenta vector, setting each momentum to have a total initial energy equal to the harmonic ZPE of the molecule. With the exception of uracil molecule, we calculate the Hessian every two steps and approximate it otherwise.86; 87 This is a typical setup that is often enough to recover MC SCIVR and DC SCIVR vibrational spectra with an average accuracy within 20-25 wavenumbers.73; 96; 53; 102
III Results and Discussion
We start our investigation with the lowest dimension nucleobase, i.e. uracil. For this system we perform both MC SCIVR and DC SCIVR calculations to prove once more the reliability of the DC SCIVR method. Results are compared with experiments and high level VPT2 calculations. Then, we move to cytosine, for which there are two conformers which are spectroscopic relevant, and to thymine and adenine vibrational spectra and compare them to the available experimental results. With the exception of uracil, full-dimensional MC SCIVR calculations can not be afforded for the other nucleobases. In these cases, only DC SCIVR simulations will be performed. The molecular structures of these molecules are reported in Fig. 1.
III.1 Uracil
Uracil is made of one pyrimidine ring resulting into twelve atoms. The molecular structure of the global minimum is reported in panel (a) of Fig. 1. For this molecular system the energy difference between the global minimum (oxo form) and its tautomer (hydroxy form) is around 45 kJ/mol.21 For this reason, we perform our semiclassical study only on the structure reported in panel (a) of Fig. 1, since it is expected to be by far the most representative one. In the past years, this molecule was the subject of several studies, both with experimental and theoretical approaches. Specifically, we will compare our semiclassical vibrational estimates with experimental values103; 104; 105 and GVPT2 and CVPT2 calculated energy levels.31; 30 As a first evaluation of the level of theory, we report in Table 1 the computed harmonic frequencies of the molecule at B3LYP/aug-cc-pvdz level of theory together with those of Ref. 30, calculated using a hybrid CCSD(T)/B3LYP force field, where harmonic CCSD(T) estimates have been corrected using GVPT2 estimates at the level of B3LYP theory for anharmonicities. We can observe a good agreement between these two column values, suggesting that our computational setup is a good harmonic estimate for a possible accuracy and feasibility of the semiclassical simulations. When moving to the semiclassical dynamics results, each uracil fundamental frequency is evaluated by tailoring the reference state of the semiclassical integrand according to the MC SCIVR approach of Eq. (10). Given the above mentioned dimensional limits, uracil represents also a good benchmark for comparison between MC SCIVR and DC SCIVR. The full-dimensional space is partitioned employing the Hessian matrix method.69 A threshold value equal to leads to a 17-dimensional subspace, and all other subspaces are mono-dimensional. In Table 1 the computed MC SCIVR and DC SCIVR energy levels are reported. In the fifth and sixth columns of the same Table, the GVPT2 and CVPT2 estimates are slightly more accurate than the semiclassical ones, probably because of the higher level of ab initio theory employed.
MC SCIVR and DC SCIVR values are very similar and close to the experimental values as well. The MAE obtained by comparison with the experimental results are respectively 22 and 20 wavenumbers, which is pretty much the accuracy of other semiclassical simulations. Actually, also the MAE for the harmonic approximation at the same level of theory is quite similar. We think this good accuracy is accidental, because higher level ab initio harmonic estimates (see Table 1 of Ref.30) provide frequencies which are systematically higher than B3LYP ones. Going back to the semiclassical results, we think that the DC SCIVR MAE is slightly smaller than the more accurate MC SCIVR one, either because of a compensation of errors or because MC SCIVR has been pushed too close to the dimensional limit of the method. These results confirm that the main advantage of the DC SCIVR approach is given by its portability and applicability to higher-dimensional molecules by retaining a comparable accuracy to that one of the MC SCIVR.
Interestingly, MC SCIVR is able to recover some experimental aspects which are very hard to be reproduced, such as the exchange in energy levels between modes 7 and 8. Namely, by employing normal mode or GVPT2 normal mode analysis (see respectively columns three and four in Table 1) or VPT2 approaches (columns five and six of the same table), mode 7 results lower in energy than mode 8, while according to the experimental assignment mode 7 is slightly higher in energy than mode 8. MC SCIVR estimates agree with the experimental picture by locating mode 7 at 552 cm and mode 8 at 544 cm. We believe this peak inversion can be reproduced only by including the relevant anharmonic effects experienced by semiclassical trajectories far from the equilibrium configuration.
Figure 2 reports the spectrum of the 17-dimensional uracil subspace. Each fundamental excitation was obtained by tailoring the reference state \Bigl{|}\chi\Bigr{\rangle} of the semiclassical integrand according to Eq. (10). Vertical dashed lines in the figures are centered at the available experimental levels. The peaks are labeled according to the first column of Table (1) and the values are reported in the same table.
In summary, our results for uracil molecule show that all relevant anharmonic effects can be reproduced and that the full dimensional MC SCIVR calculations lead to accurate results. DC SCIVR results are comparable and enough accurate to pursue other nucleobase investigation, where full-dimensional MC SCIVR calculations cannot be longer afforded.
III.2 Cytosine
The next nucleobase we treat is cytosine, which is constituted by a pyrimidine ring. It differs from uracil by the presence of the amino group in place of an oxygen atom. This molecule is made by 13 atoms, resulting into 33 vibrational degrees of freedom. In addition to the increased dimension, the vibrational spectroscopic investigation gets complicated by the tautomerism between the oxocytosine and the hydroxycytosine. Both forms are spectroscopic relevant, given the very small minimum energy difference of few kJ/mol.23 This difference is 2.1 kJ/mol at our level of DFT theory,21 and in favour of the oxo form examined in the previous section. Thus, we perform molecular dynamics simulations of both tautomers by running two classical trajectories, one starting from the oxocytosine isomer and the other from the hydroxycytosine one (respectively panels (b) and (c) of Fig. 1). We observe that there is no isomeric change along the entire simulation time of 25000 au. For this nucleobase, we observe the presence of a strong coupling between hindered rotations and other vibrational modes. For this reason, the initial kinetic energy of the first seven lowest frequency vibrational modes is set to zero, since rotational contributions would jeopardize the numerical convergence of the spectra. This strategy is similar to what has been done in previous semiclassical calculations,70; 71 and it does not represent a bias since the initial kinetic energy of the trajectory is reduced by only 5% below the harmonic ZPE value, which is obviously in excess with respect to the actual ZPE. The columns of Tables 2 and 3, labeled “Harmonic”, report the harmonic frequencies of both isomers. We observe from the experimental values that most of the fundamentals of the two isomers are very similar in energy and this can be seen already at the harmonic level. A common strategy to discriminate between the two isomers is to look at the region around 3400-3700 wavenumbers. In that region are present for both isomers the symmetric and asymmetric NH2 stretches, around 3500-3600 cm-1. Additionally the oxo form shows a peak around 3450 cm-1 corresponding to the NH stretching, that is missing in hydroxycitosine spectrum that instead shows a signal above 3600 cm-1, due to the OH stretching. This trend is well reproduced by the DC SCIVR results that can effectively discriminate between the two tautomers. Tables 2 and 3 report the DC SCIVR computed energy levels and show the comparison with available experimental data and other theoretical results. The full dimensional vibrational space is divided into subspaces using a threshold value of for Hessian elements. This choice generates for the oxocytosine one twenty dimensional subspace, leaving all the others monodimensional. In the case of hydroxycytosine, the threshold is fixed at and the full-dimensional space is fragmented into one eleven-dimensional, one seven-dimensional, and all remaining monodimensional. We observe that for both isomers the agreement with the experiment is very strict, giving a MAE around 18 cm for both systems , even if there are some frequencies which are occasionally quite off the mark. Harmonic estimates show a double MAE deviation.
Figure 3 shows DC SCIVR spectra for a 20-dimensional subspace of the oxocytosine isomer, while Fig. 4 reports the computed spectra for a 11-dimensional subspaces of the hydroxycytosine isomer. We observe that the number of peaks in Fig.s 3 and 4 is less than the subspace-dimension, since such subspace contains one low index mode (1-7). These figures show neat spectroscopic signals with several overtones reproduced, especially Fig. 4.
So far we have looked at the lowest dimensional couple of nucleobases made by a pyrimidine ring. Our results agree with experiments with an average deviation of 15-20 wavenumbers, in line with previous semiclassical simulations, and are comparable with other theoretical methods. In the next section we look at thymine, the last pyrimidine-based nucleobase.
III.3 Thymine
Thymine is the highest dimension pyrimidine nucleobase. The structure is similar to uracil, where one hydrogen atom is substituted by a methyl group. Because of their resemblance, thymine and uracil show some similarities in biological processes and both link to adenine. The molecule is made of 15 atoms leading to 39 vibrational degrees of freedom. We calculate the vibrational spectra using classical trajectories starting from the global minimum structure, as we have done for uracil in Section III.2, since the hydroxy tautomer minimum is about 44 kJ/mol less stable,20 a value similar to the one between uracil tautomers (45 kJ/mol).21 This is expected given their structural similarity. The geometry of the molecule at equilibrium configuration is reported in Panel (d) of Fig. 1. The Hessian partitioning method leads to a 18-dimensional subspace, a seven-dimensional one and all other are mono-dimensional, when employing a threshold parameter . DC SCIVR frequencies are calculated for each subspace.
Table 4 reports the vibrational frequencies at different level of calculation and compares them with the experimental ones.. In the Table, DC SCIVR values are compared with the experimental energies of gas phase isolated and Ar-tagged thymine. We also report modes 1-3 even if no experimental data are available at the best of our knowledge. From the Table, we observe that the MAE of DC SCIVR estimates is equal to 17 and 21 wavenumbers in comparison with gas phase and Ar-tagged levels respectively, which is less than half the harmonic estimates. These values are in line once again with the typical semiclassical accuracy and are comparable with what was found from previous pyrimidine bases.
Figure 5 shows the computed DC SCIVR 18 dimension spectra and one can notes how the thymine fingerprint region around 1000 wavenumbers is very crowded. However, by using Eq. (10), we are able to selectively report each peak, as showed in Fig. 5. In this way it is possible to resolve and attribute peaks which would be otherwise hardly distinguishable, since they are within few wavenumbers in energy, as in the case of modes 18-20 and 28-30.
III.4 Adenine
Now we move our attention to the remaining nucleobase, adenine. Adenine is a nucleobase made of a purine ring, composed of a pyrimidine condensed with an imidazole. The molecular structure at equilibrium configuration is reported in panel (e) of Fig. 1, showing 15 atoms and resulting into 39 vibrational degrees of freedom, the same as thymine. In this study we focus our attention only on the global minimum structure, since previous works suggested that the other tautomers of the molecule are located around 30 kJ/mol above the global minimum, an amount of energy difference that is close to that of thymine and uracil, rather than cytosine.
The 39 vibrational degree of freedom space is separated into one 23-dimensional, two bi-dimensional and all other are mono-dimensional subspaces, by employing a threshold parameter for the Hessian method equal to . Figure 6 reports the spectrum of the highest dimensional subspace for this molecule.
Once again, our reference state choice according to Eq. (10) helps us to resolve different peaks which are very close in energy, like modes 37, 38 and 39 or 32, 33 and 34. Fig. 6 presents several overtone peaks, in comparison with previous ones. We believe this is due to the presence of the hindered rotation, which gets easily vibrational excited during the dynamics. Nevertheless, the full width at half maximum (FWHM), which denotes the peak definition, is quite small. For a more detailed comparison with experiments, Table 5 reports the computed energy levels and shows the comparison with available theoretical and experimental results, measured in isolated gas phase and Ar matrix. The average deviation of the DC SCIVR results from both experiments is quite small, only 16 and 14 wavenumbers, almost three times more accurate than harmonic estimates. This accuracy is comparable with previous system investigations and with PT2 estimates. Overall, the DC SCIVR spectrum of adenine is a further proof of the invariance of accuracy of our approach with the increase in nucleobase dimensionality. We believe that the investigation of this purine nucleobase strengthens previous considerations about pyrimidine-based nucleobases, in terms of spectroscopic quality and accuracy of the energy levels, when compared with the experiments.
IV Summary and Conclusions
In this paper we have presented a semiclassical investigation of the vibrational features of uracil, cytosine, thymine, and adenine nucleobases. The investigation on uracil has shown that MC SCIVR energy levels are on average around 20 wavenumbers away from experimental levels, a typical value for semiclassical spectroscopic calculations. Then, the DC SCIVR method leads to values very close to the full-dimensional MC SCIVR ones, proving its reliability for the calculation of similar systems. Moving to higher dimensional nucleobases, DC SCIVR energy levels of cytosine retain their typical accuracy with respect to the experimental results. Despite the presence of more than one comparable minimum, the method still reproduces the spectra of different isomers retaining the standard accuracy of semiclassical simulations. Then, we focus on the last pyrimidine molecule, thymine, which is the highest dimensional nucleobase of this type. We have found that DC SCIVR retains its accuracy despite the increased dimensionality. Similar considerations hold for the adenine case, where the MAE is around 15 wavenumbers and comparable with PT2 estimates. Overall, we always find the accuracy of the DC SCIVR method to be comparable to other state of the art theoretical spectroscopy methods.
These outcomes are promising for a future exploitation of the method. Since the accuracy is seemingly insensitive to the increase in the molecule dimensionality, we will exploit DC SCIVR also to study more complex systems, like nucleotides and nucleobase pairs. This will possibly pave the way toward the assessment of important structural features of nucleoacids that lead to the formation of secondary and tertiary structures.
Acknowledgments
Authors thank dr. Riccardo Conte for useful discussions. Authors acknowledge support from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No [647107] – SEMICOMPLEX – ERC-2014-CoG). Authors acknowledge also CINECA (Italian Supercomputing Center) for providing high perfomance computational resources under the IscraB grant (QUASP). All authors thank Università degli Studi di Milano for further computational time at CINECA.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Watson and Crick (1953) J. D. Watson and F. H. Crick, Nature 171 , 964 (1953).
- 2Gabelica (2016) V. Gabelica, Nucleic Acids in the Gas Phase (Springer, 2016).
- 3Rein et al. (1983) R. Rein, M. Shibata, R. Garduno-Juarez, and T. Kieber-Emmons, Structure and Dynamics: Nucleic Acids and Proteins (1983) pp. 269–288.
- 4Danilov and Kventsel (1971) V. Danilov and G. Kventsel, Electronic representations in the point mutation theory (Naukova Dumka: Kiev, 1971).
- 5Colominas, Luque, and Orozco (1996) C. Colominas, F. J. Luque, and M. Orozco, J. Am. Chem. Soc. 118 , 6811 (1996).
- 6Wells (2007) R. D. Wells, Trends Biochem. Sci. 32 , 271 (2007).
- 7Gaigeot and Sprik (2003) M.-P. Gaigeot and M. Sprik, J. Phys. Chem. B 107 , 10344 (2003).
- 8Gaigeot and Sprik (2004) M.-P. Gaigeot and M. Sprik, J. Phys. Chem. B 108 , 7458 (2004).
