Adaptive QM/MM for Molecular Dynamics Simulations: 5. On the Energy-Conserved Permuted Adaptive-Partitioning Schemes

In combined quantum-mechanical/molecular-mechanical (QM/MM) dynamics simulations, the adaptive-partitioning (AP) schemes reclassify atoms on-the-fly as QM or MM in a smooth manner. This yields a mobile QM subsystem with contents that are continuously updated as needed. Here, we tailor the Hamiltonian adaptive many-body correction (HAMBC) proposed by Boreboom et al. [J. Chem. Theory Comput. 2016, 12, 3441] to the permuted AP (PAP) scheme. The treatments lead to the HAMBC-PAP method (HPAP), which both conserves energy and produces accurate solvation structures in the test of “water-in-water” model system.


Introduction
Molecular dynamics (MD) simulations of diffusive systems, such as the diffusion of a solute (a solvated ion or molecule) through solvent, has been a challenging task for multiscale methods, especially for combined quantum-mechanics/molecular-mechanics (QM/MM) methods . In conventional QM/MM methodology, atoms are designated as QM or MM at the beginning of a simulation and do not change these identities throughout a simulation. This causes difficulties when solvent molecules are exchanged between the solute's solvation shells and the bulk solution, which may occur frequently. Adaptive QM/MM mitigates these problems by reclassifying the atoms as QM or MM on-the-fly based on their positions, assuring that the solute and its solvation shells are always described at the QM level of theory [25,. As a result, the QM and MM partitions in adaptive QM/MM are dynamically updated as needed, in contrast to the static partitions in conventional QM/MM.
One adaptive QM/MM algorithm is the permuted adaptive partitioning (PAP) scheme [29,32], which uses distance-based criteria for the QM and MM partitioning ( Figure 1). In PAP, the QM zone, also called the active zone, consists of the solute and all molecules within a preset cutoff distance r min from the solute. A group-based prescription is often adopted, where a whole molecule is treated as an entity, and the distance from the solute r is measured using the center of mass or a representative atom of the entire molecule. The description for whole molecules can also be applied to molecular fragments, such as functional groups [32]. The MM zone, also known as the environmental zone, Figure 1. Adaptive QM/MM exemplified by a "water-in-water" model. One selected water molecule (oxygen atom labeled by O′) serves as the active-zone center. This molecule and its immediate surrounding water molecules within a distance ( < min ) are treated at the QM level of theory, while those at remote distances ( > max ) at the MM level. The water molecules at intermediate distances ( max ≥ ≥ min ) are in the buffer zone and have mixed QM and MM characters. The energy of the system and the gradients of all (QM, buffer, and MM) atoms are smoothly interpolated when molecules migrate into, across, or out of the buffer zone.
A challenge in PAP (and other distance-based adaptive QM/MM methods) concerns the gradients due to the smoothing functions employed in the interpolation procedure, which, if not negligible, may cause artefacts in the MD simulations [30,35,42,46]. These forces, which are sometimes called transition forces, are proportional to the difference between the QM and MM energies at the current geometry. More specifically, the energy difference is the energy released (or absorbed) when a buffer group is switched from MM to QM while holding the QM or MM classifications of the other groups as well as all atomic coordinates fixed [42]. These transition forces are therefore associated with the difference in chemical potential between the QM and MM descriptions of the given buffer group. These transition forces drive the molecules moving towards the QM or MM zones, even in the absence of the interpolated gradients between the QM and MM potential derivatives, which are considered the "real" or "physical" forces.
In principle, the effects due to these transition forces can be eliminated or minimized by carefully aligning the QM and MM potentials [29]. However, it is difficult to align multi-dimensional potentials, and a simple and general algorithm to for potential alignment has not yet been developed. An alternative solution is the modified PAP (mPAP) scheme, where external forces are applied to cancel out these artificial forcs [35]. Mathematically, the transition forces are simply deleted. Conceptually, this means that the chemical potentials are equalized at every step for the system [46]. It has been demonstrated that mPAP yields reasonably accurate structures and dynamics in MD simulations [35,40,43]. The downside is that, because of the involvement of the external forces, the scheme no longer has a Hamiltonian formulation and therefore cannot be used for studies where Hamiltonian systems are required [43].
Recently, Boreboom et al. [42] proposed the Hamiltonian adaptive many-body correction (HAMBC) for sorted adaptive-partitioning (SAP) QM/MM simulations. Inspired by the works of the molecular-mechanics/course-grained (MM/CG) community, especially the Hamiltonian adaptive resolution scheme (H-AdResS) by Potestio et al. [53] the HAMBC method includes per-molecule- Figure 1. Adaptive QM/MM exemplified by a "water-in-water" model. One selected water molecule (oxygen atom labeled by O ) serves as the active-zone center. This molecule and its immediate surrounding water molecules within a distance (r < r min ) are treated at the QM level of theory, while those at remote distances (r > r max ) at the MM level. The water molecules at intermediate distances (r max ≥ r ≥ r min ) are in the buffer zone and have mixed QM and MM characters. The energy of the system and the gradients of all (QM, buffer, and MM) atoms are smoothly interpolated when molecules migrate into, across, or out of the buffer zone.
A challenge in PAP (and other distance-based adaptive QM/MM methods) concerns the gradients due to the smoothing functions employed in the interpolation procedure, which, if not negligible, may cause artefacts in the MD simulations [30,35,42,46]. These forces, which are sometimes called transition forces, are proportional to the difference between the QM and MM energies at the current geometry. More specifically, the energy difference is the energy released (or absorbed) when a buffer group is switched from MM to QM while holding the QM or MM classifications of the other groups as well as all atomic coordinates fixed [42]. These transition forces are therefore associated with the difference in chemical potential between the QM and MM descriptions of the given buffer group. These transition forces drive the molecules moving towards the QM or MM zones, even in the absence of the interpolated gradients between the QM and MM potential derivatives, which are considered the "real" or "physical" forces.
In principle, the effects due to these transition forces can be eliminated or minimized by carefully aligning the QM and MM potentials [29]. However, it is difficult to align multi-dimensional potentials, and a simple and general algorithm to for potential alignment has not yet been developed. An alternative solution is the modified PAP (mPAP) scheme, where external forces are applied to cancel out these artificial forcs [35]. Mathematically, the transition forces are simply deleted. Conceptually, this means that the chemical potentials are equalized at every step for the system [46]. It has been demonstrated that mPAP yields reasonably accurate structures and dynamics in MD simulations [35,40,43]. The downside is that, because of the involvement of the external forces, the scheme no longer has a Hamiltonian formulation and therefore cannot be used for studies where Hamiltonian systems are required [43].
Recently, Boreboom et al. [42] proposed the Hamiltonian adaptive many-body correction (HAMBC) for sorted adaptive-partitioning (SAP) QM/MM simulations. Inspired by the works of the molecular-mechanics/course-grained (MM/CG) community, especially the Hamiltonian adaptive resolution scheme (H-AdResS) by Potestio et al. [53] the HAMBC method includes per-molecule-based correction terms to the SAP QM/MM Hamiltonian. By design, the gradients of the correction terms should cancel out the average transition forces due to the smoothing functions; the cancellation is not necessarily exact at any single time step. The HAMBC was demonstrated in test calculations of a "water-in-water" model using dual MM levels, where a selected water molecule as the solute and its solvation shell are treated by one MM force field model and the bulk solvent by another MM force field model [42]. (Due to their high efficiencies, dual-MM test calculations have frequently been employed in testing adaptive QM/MM schemes, e.g., in the development of the original PAP scheme [29].) It is encouraging to find that HAMBC was able to produce correct solvation structures for the selected solute water while conserving energy in SAP simulations [42].
In this paper, we report the tailoring of the HAMBC treatment to the much more sophisticated PAP method. In the HAMBC by Boreboom et al. [42] the per-molecule correction term is a function of the fractional "QM character" for a solvent molecule, which is the sum of the weights of the contributing partitions that describe this solvent molecule at the QM level. Because in general there is no analytical function for the correction term, the correction must be calculated through thermodynamic integration over the selected variable. In PAP, it is more convenient to use the value of the smoothing function than the QM character for the given buffer group. We have previously shown that this QM character is equivalent to the value of the smoothing function for a given buffer group in a fully expanded PAP potential [46]. However, the many-body expansion of the PAP potential is often truncated to reduce computational costs, and in a truncated PAP potential, the value of the smoothing function no longer equals the QM character. In this work, we demonstrate that the correction can be taken as a function of the value of the smoothing function even when the PAP potential is truncated.

The PAP Algorithm
In PAP, the QM/MM potential is expressed using a many-body expansion: [29] Or more compactly, [32] Here, V A is the QM/MM energy of the partitions with no buffer group treated at the QM level, V A i with the i-th buffer group treated at the QM level, V A i,j with the i-th and j-th buffer groups treated at the QM level, . . . V A 1,2,··· ,N with all N buffer groups at the QM level, P i is the smoothing function for the i-th buffer group, and the weights σ 0 , σ i , σ i,j . . . are assigned to the partitions V A , respectively. Note that N can vary from one time step to another. We have chosen the smoothing function P i to be a fifth-order spline: where α i is a reduced distance for the i-th buffer group, where the distance r i = |r i | = |x i − x A | is between the buffer group at position x i and the solute at position x A , and r max ≥ r i ≥ r min . The smoothing function P i is continuous and differentiable over the domain [0, 1], and corresponds to all possible r i that a buffer group can take in the buffer zone. The PAP scheme is a Hamiltonian algorithm and conserves energy. According to Equation (1), a buffer group is treated by QM in some partitions and by MM in the others, in line with its dual QM-MM characteristics. This contrasts with the groups in the QM and MM zones, which remain as QM and MM, respectively, in all possible partitions. (Note that the QM and MM zones are also called the active and environmental zones, respectively.) All derivatives of the PAP potential energy with respect to the coordinates vary smoothly up to the same order for which P i varies continuously. The full expansion of the PAP potential requires 2 N calculations. However, the potential is typically truncated and only the terms from a subset of lower-orders are calculated to increase computational efficiency.
If we denote the included partitions by S, be it a subset or the total of all the partitions at a given time step, the PAP potential can be written as: The derivative with respect to any atomic Cartesian coordinate component x t is: The first term (F phys = − ∑ s∈S σ s ∂V ∂x t ), the negative of the sum of the smoothed gradients, represents the interpolated physical forces between the QM and MM forces. The second term corresponds to the gradients due to the smoothing functions and is the so-called transition forces. The transition forces are associated with the difference in chemical potentials between the QM and MM regions [42]. These transition forces are pairwise, acting on both the solute molecule at the QM-zone center and the buffer groups, and they cause structural artifacts if non-negligible. This "transition-force problem" is universal in distance-based adaptive QM/MM algorithms and is not limited to PAP [46]. Various treatments have been implemented to eliminate the problem or minimize its effects. For example, in the non-Hamiltonian mPAP method [35], these transition forces are deleted, and as a result, the remaining forces acting on the atoms no longer correspond to the potential defined in Equation (1). The mPAP scheme has been shown to yield very accurate structural and dynamics properties in a number of tests [35,40,41].

HAMBC Expression for PAP
To minimize the effects due to the transition forces while maintaining energy conservation, Boreboom et al. [42] recently proposed the HAMBC for SAP QM/MM, which also suffers from the transition force problem. In HAMBC-SAP, an energy correction term is added to the SAP potential for every buffer group. The correction term is assumed to depend only on the type of the buffer group (e.g., water versus ethanol) and the so-called "QM character," which is the sum of the weights of the contributing partitions that describe this buffer group at the QM level. For PAP, it would be more convenient to select the smoothing function P i than to choose the QM character. Both P i and the QM character are metrics related to the buffer group s position within the buffer zone; the QM character is equivalent to P i in a fully expanded PAP potential but differs otherwise [46]. The theory and implementations for HAMBC-PAP (HPAP for short) are described below.

A Mean-Field Treatment of the Individual Group Corrections
As in Ref. [42], we assume here that the total correction is a sum of individual group corrections over all N buffer groups and that the individual i-th group correction (H C i (r i )) only depends on the type of species of the group and on the group s distance r i to the active-zone center. Because P i decreases monotonously from 1 to 0 as r i increases from r min to r max , the correction term H C i (r i ) can also be expressed as a function of the dimensionless P i , i.e., H C i (P i ). That the corrections to the buffer groups are independent of each other implies that the interactions between the groups are treated in a "mean-field" manner as the correlations between the groups are neglected. Thus, many-body effects are treated in an average fashion. For simplicity, we only consider one species, the solvent. Under this circumstance, the new HPAP potential is: The subscript i in H C i (P i ) is dropped because all corrections are identical for the same species. The corresponding forces are given by: The correction force due to the derivative of the individual correction energy is designed such that, at any given distance r i that separates the solute molecule and the i-th buffer group, it should cancel the average transition forces acting on the buffer group: The requirement is enforced for any given r i and x t . Therefore, if follows that The relevant x t are the coordinates of the buffer group. Based on Newton's Third Law of Motion, such cancellation will automatically be realized at the active-zone center.
We compute the individual correction derivative as the average energy difference between a pair of partitions where the only variation is that the i-th group at r i is treated at QM or MM: Here, the energies of the partition pair are denoted V i=QM (r i ) and V i=MM (r i ), respectively, and indicates the corresponding average. As usual, the zeros of the potentials V i=QM and V i=MM are chosen such that these energies represent the net interaction energies [46]. At any given time step of the simulation, the number of partitions depends on the truncation order of mPAP potential, and usually there are multiple such pairs of partitions for the i-th group at r i , each with different buffer groups treated by QM. Thus, the average is over the partition pairs for the buffer group at r i in a sampled geometry and over all sampled geometries while keeping the buffer group at r i in the mPAP simulations. By varying r i continuously, one obtains the complete profile of ∆E i . Again, because P i decreases monotonously from 1 to 0 as r i increases from r min to r max , the average energy difference ∆E i , which depends solely on r i , can also be expressed as a function of the dimensionless P i .
First, we consider one partition pair for a sampled geometry at a given time step. Let us assume that the partition of V i=QM (r i ) has a QM subsystem consists of the active-zone "A", the i-th group, and (m − 1) other groups selected from the N buffer groups. Accordingly, the other partition of V i=MM (r i ) will feature a QM subsystem consisting of the active-zone "A" and the other (m − 1) buffer groups, since the i-th group is treated by MM. These two partitions correspond to the m-th and (m − 1)-th order terms in the mPAP potential, respectively. For the sake of brevity, we refer to this partition pair as of the m-th order. Without losing generality, we label these m buffer groups by 1, 2 . . . m. The energy difference for this partition pair is Next, we combine all such partition pairs corresponding to a given m-th order term in the mPAP over partition pairs for a given m and then over m for m = 1, 2, . . . , p. This gives the average energy difference for a given sampled geometry: We then average ∆E i over all sampled geometries in the simulations where the i-th group is located at r i . This gives ∆E i for the corresponding P i . The complete curve of ∆E i (P i ) is obtained by varying P i continuously from 0 to 1. For better statistics, we also average over all buffer groups of the same type in the simulations.
In general, there is no analytic form for H C (P i ), which must be obtained through thermodynamic integration: where P i is the dummy variable for integration. The final correction is applied to each group in the system according to the following scheme: We note that H C (0) = H C (1) in general. To see how many-body interactions are incorporated into the energy difference for a partition pair, we use the standard formula of many-body expansion for a system made of n monomers: where ε a is the energy of an isolated monomer, ε ab is the energy of a dimer minus the energies of the monomers it comprises (i.e., it is the pairwise interaction energy), etc. Applying Equation (14) to V A 1,2,··· ,i−1,i,i+1,··· ,m (r i ) and V A 1,2,··· ,i−1,i+1,··· ,m (r i ), and after canceling identical terms, one has where Here ε i=QM and ε i=MM are the QM and MM energies of the i-th group at the QM and MM levels, respectively; ε j,i=QM is the pairwise interaction energy between groups i and j, both treated at the QM level; ε j=QM; i=MM is the pairwise interaction energy between group j described at the QM level and group i at the MM level; ε j,k,i=QM is the three-body contribution by groups i, j, and k treated at the QM level; ε j,k=QM;i=MM is the three-body contribution by groups j and k described at the QM level and group i at the MM level . . . Terms involving the active-zone "A" are defined similarly. The many-body interactions where the i-th group treated by MM depend on the specific QM/MM embedding model. Note that these many-body interactions never need to be explicitly evaluated; they are prescribed here merely to assist understanding of the many-body interactions in ∆E (m) i . The above analysis indicates that ∆E (m) i is a sum of the differences between the many-body terms in the partition pair involving the i-th group in the many-body expansions up to the m-th order.

Effects Due to Truncation in mPAP Hamiltonian
If the mPAP potential is truncated at a low order p and if N p, which is the case in our current simulations, the average is dominated by the p-th order ∆E (p) i , which has the largest number of partitions: Moreover, because all groups are the same type of molecules, That is, ∆E i depends on the truncation order in the mPAP potential. A special case is dual-MM, where effective two-body potentials are employed, the third and higher order terms are 0 in Equation (14). Consequently, ∆ε Aji , ∆ε jki , and higher-order contributions vanish, leading to

Simulation Details
An Open-MP-based parallel version of HPAP method has been implemented in our in-house code, QMMM [54]. The QMMM code calls an MM program such as NAMD [55] for MM calculations and a QM program such as MNDO [56] for QM calculations, synthesizes the energies and gradients from both, and propagates the trajectory. Here, to reduce the computational expense from QM/MM calculations, we carry out dual-MM (MM /MM) simulations, as we have done previously in the development of the original PAP algorithm [29] and as Boreboom et al. [42] did in their HAMBC study. Since we are only focusing on the adaptive-boundary treatments, the use of dual-MM test calculations suffices.
The model system for all calculations is a 30.25 × 30.25 × 30.25 Å 3 water box that contains 915 water molecules (density = 0.99 g/mL). The water box was prepared by equilibration for 10 ns under the NVT ensemble at the single-MM level using the TIP3P/Fs [57,58] water model. For the adaptive-partitioning simulations, each water molecule is an adaptive partitioning group, with the oxygen atom designated as the representative atom for the group. One water oxygen was chosen to be the active-zone center for the simulation, and the buffer and active zone radii were set to r max = 6.4 Å and r min = 5.5 Å respectively. All adaptive-partitioning potentials were truncated at the 2nd order. For all calculations, periodic boundary conditions with the minimum-image convention were adopted. The cutoff for non-bonded interactions was set to 12 Å with smoothing switched on at 11 Å. All simulations are under constant volume with a time step of 0.5 fs. If constant temperatures were required, a Nosé-Hoover [59] thermostat of 298.15 K was coupled to the model system with a coupling coefficient of 40 fs.
First, to obtain the correction term H C , the model was simulated at constant temperature at the MM /MM level using the mPAP scheme, where MM = SPC/Fw [58] and MM = TIP3P/Fs [57,58] (the force field parameters are compared in Table 1). Five 20-ps trajectories were propagated. Only the second half 10 ps of every trajectory was utilized to determine the correction term H C , during which the energies of the partitions were recorded. Note that multiple partitions and thus multiple energy differences are available at one single time step, unless there is no or only one buffer group. The combined 50-ps trajectories for the H C calculations resulted in a total of 16,791,809 energy differences (∆E (1) i and ∆E (2) i ). Figure 2 shows the density distribution of ∆E i (P i ). We found that the 50 ps combined simulation time was sufficient to converge the correction term in the present work (more details in the Results section). These energy differences ∆E i (P i ) were added to 100 bins equally spaced over the domain [0, 1] according to the P i value of the buffer group. The average of each bin was then taken and used as input for linear regression to obtain the function ∆E(P i ). We have dropped the subscript i, because ∆E i (P i ) is identical for all buffer groups here. The linear regression was motivated by the approximately linear nature of the ∆E(P i ) curve (more details in the Discussion section). The analytical integral of the regression line was used to obtain H C (P i ). Both ∆E(P i ) and H C (P i ) were discretized into 10,000 evenly spaced points between the domain [0, 1] and are stored as a look-up table read in by the QMMM source code. The correction for a group was chosen by looking up the ∆E(P i ) and H C (P i ) values in the table for the next largest P i value.

HAMBC Correction Term
Figure 2a plots the energy difference over the thermodynamic variable ∆ ( ) obtained from the combined 50-ps trajectories, together with the linear regression results. The almost perfect linear fit is incidental due to the similarity in some parameters of the two employed force fields (see Section 4 for detailed discussion). In general, the linearity is very approximate or does not hold. Nevertheless, by taking advantage of the analytical representation of 〈∆ ( )〉 from the linear regression, we estimated the correction term C ( ), which is depicted in Figure 2b. To explore the length of simulation needed to generate the HPAP correction, we calculated the root mean squared deviation (RMSD) of ∆ ( ) and C ( ) as functions of simulation time, taking as reference the final curves from the accumulated 50-ps simulations ( Figure 3). The RMSD values of ∆ ( ) and C ( ) were calculated over the 100 bins that are equally distributed bins along the domain [0, 1]. It can be seen that both RMSDs drop consistently after ~15 ps. The function ∆ ( ) converged to below 0.01 kcal mol −1 after 30 ps of combined simulations. Because integration reduces the fluctuations in ∆ ( ), the RMSD values of C ( ) are even smaller than those of ∆ ( ). A necessary requirement for the correction term is that the correction force corr must, on average, cancel out the transition force trans . This fact is exemplified in the distribution of the difference between these two forces (∆ = trans − corr ) in the final HPAP simulations (Figure 4). It can be seen that ∆ is distributed approximately normally around 0 kcal·mol −1 Å −1 across the entire Second, to examine the solvation structures obtained through the HAMBC correction, we carried out adaptive-partitioning simulations at constant temperature using three schemes: PAP, mPAP, and HPAP. For each scheme, five trajectories were propagated using the final geometries and velocities from the five mPAP simulations used to compute H C . Each trajectory was propagated for 10 ps. The combined 50 ps trajectories were employed to calculate the pairwise radial distribution function for a given scheme.
Finally, to examine the degree of energy conservation, we also performed NVE simulations for all three schemes. One trajectory for each method was propagated for 25 ps with 0.5 fs time steps.

HAMBC Correction Term
Figure 2a plots the energy difference over the thermodynamic variable ∆E(P i ) obtained from the combined 50-ps trajectories, together with the linear regression results. The almost perfect linear fit is incidental due to the similarity in some parameters of the two employed force fields (see Section 4 for detailed discussion). In general, the linearity is very approximate or does not hold. Nevertheless, by taking advantage of the analytical representation of ∆E(P i ) from the linear regression, we estimated the correction term H C (P i ), which is depicted in Figure 2b.
To explore the length of simulation needed to generate the HPAP correction, we calculated the root mean squared deviation (RMSD) of ∆E(P i ) and H C (P i ) as functions of simulation time, taking as reference the final curves from the accumulated 50-ps simulations (Figure 3). The RMSD values of ∆E(P i ) and H C (P i ) were calculated over the 100 bins that are equally distributed bins along the P i domain [0, 1]. It can be seen that both RMSDs drop consistently after~15 ps. The function ∆E(P i ) converged to below 0.01 kcal mol −1 after 30 ps of combined simulations. Because integration reduces the fluctuations in ∆E(P i ), the RMSD values of H C (P i ) are even smaller than those of ∆E(P i ).
correction term C ( ) computed by integration using the analytical representation of 〈∆ ( )〉.
To explore the length of simulation needed to generate the HPAP correction, we calculated the root mean squared deviation (RMSD) of ∆ ( ) and C ( ) as functions of simulation time, taking as reference the final curves from the accumulated 50-ps simulations (Figure 3). The RMSD values of ∆ ( ) and C ( ) were calculated over the 100 bins that are equally distributed bins along the domain [0, 1]. It can be seen that both RMSDs drop consistently after ~15 ps. The function ∆ ( ) converged to below 0.01 kcal mol −1 after 30 ps of combined simulations. Because integration reduces the fluctuations in ∆ ( ), the RMSD values of C ( ) are even smaller than those of ∆ ( ). A necessary requirement for the correction term is that the correction force corr must, on average, cancel out the transition force trans . This fact is exemplified in the distribution of the difference between these two forces (∆ = trans − corr ) in the final HPAP simulations (Figure 4). It can be seen that ∆ is distributed approximately normally around 0 kcal·mol −1 Å −1 across the entire A necessary requirement for the correction term is that the correction force F corr must, on average, cancel out the transition force F trans . This fact is exemplified in the distribution of the difference between these two forces (∆F = F trans − F corr ) in the final HPAP simulations (Figure 4). It can be seen that ∆F is distributed approximately normally around 0 kcal·mol −1 Å −1 across the entire domain of P i . The calculated average ∆F over all P i is 0.0007 kcal·mol −1 Å −1 , with a standard deviation of 0.4 kcal·mol −1 Å −1 , in excellent agreement with the cancellation requirement.

Solvation Structures
We examined the solvation structure around the solute water by computing the pairwise radial distribution function (RDF) O'O ( ) between its oxygen (O′, serving as the center of the active zone) and the surrounding oxygen (O) atoms for the original PAP, mPAP, and HPAP simulations under constant temperature (Figure 5a). The original PAP results show an erroneous dip around = 5.95 Å which corresponds to the center of the buffer zone. This is caused by the transition forces pushing water out of the buffer zone. However, this artifact is eliminated by inclusion of the correction term in HPAP, for which the RDF closely matches the mPAP reference data, indicating that the HPAP method is able to produce accurate solvation structures. The results here are in line with what were found by Boreboom et al. [42] although different adaptive schemes are employed.
For comparison, we also present in Figure 5b the RDF computed from the mPAP and the two single-level MM simulations. All three RDF curves are overall quite similar, although the TIP3P/Fs water is slightly less structural than SPC/Fw. The mPAP RDF better resembles the SPC/Fw curve in the active-zone, as it is supposed to be. Interestingly, despite the similarity between SPC/Fw and TIP/Fs, a significantly distorted RDF was produced by PAP in the buffer-zone region, as observed in Figure 5a. This demonstrates the sensitivity of the RDF to the boundary treatment in the adaptive

Solvation Structures
We examined the solvation structure around the solute water by computing the pairwise radial distribution function (RDF) g O'O (r) between its oxygen (O , serving as the center of the active zone) and the surrounding oxygen (O) atoms for the original PAP, mPAP, and HPAP simulations under constant temperature (Figure 5a). The original PAP results show an erroneous dip around r = 5.95 Å which corresponds to the center of the buffer zone. This is caused by the transition forces pushing water out of the buffer zone. However, this artifact is eliminated by inclusion of the correction term in HPAP, for which the RDF closely matches the mPAP reference data, indicating that the HPAP method is able to produce accurate solvation structures. The results here are in line with what were found by Boreboom et al. [42] although different adaptive schemes are employed. single-level MM simulations. All three RDF curves are overall quite similar, although the TIP3P/Fs water is slightly less structural than SPC/Fw. The mPAP RDF better resembles the SPC/Fw curve in the active-zone, as it is supposed to be. Interestingly, despite the similarity between SPC/Fw and TIP/Fs, a significantly distorted RDF was produced by PAP in the buffer-zone region, as observed in Figure 5a. This demonstrates the sensitivity of the RDF to the boundary treatment in the adaptive simulations and underscores the importance of proper treatments implemented in mPAP and HPAP. For comparison, we also present in Figure 5b the RDF computed from the mPAP and the two single-level MM simulations. All three RDF curves are overall quite similar, although the TIP3P/Fs water is slightly less structural than SPC/Fw. The mPAP RDF better resembles the SPC/Fw curve in the active-zone, as it is supposed to be. Interestingly, despite the similarity between SPC/Fw and TIP/Fs, a significantly distorted RDF was produced by PAP in the buffer-zone region, as observed in Figure 5a. This demonstrates the sensitivity of the RDF to the boundary treatment in the adaptive simulations and underscores the importance of proper treatments implemented in mPAP and HPAP.
Next, we checked the degree of energy conservation in the NVE simulations (Figure 6a

Discussion
It is interesting to note that the correction 〈Δ ( )〉 is approximately a linear function of in the present work. To understand the origin of this approximate linearity, let us consider a given i-th buffer group. When the description of this buffer groups is switched between the two employed MM force fields, SPC/Fw and TIP3P/Fs, there will be changes in the non-bonded (van der Waals and electrostatic) interactions through which this buffer group interacts with active-zone groups, environmental-zone groups, and the other buffer groups. Also changing are the intramolecular bonded (O-H bond and H-O-H angle) interactions within this buffer group. The decomposition of 〈Δ ( )〉 is depicted in Figure 7a according to

Discussion
It is interesting to note that the correction ∆E(P i ) is approximately a linear function of P i in the present work. To understand the origin of this approximate linearity, let us consider a given i-th buffer group. When the description of this buffer groups is switched between the two employed MM force fields, SPC/Fw and TIP3P/Fs, there will be changes in the non-bonded (van der Waals and electrostatic) interactions through which this buffer group interacts with active-zone groups, environmental-zone groups, and the other buffer groups. Also changing are the intramolecular bonded (O-H bond and H-O-H angle) interactions within this buffer group. The decomposition of ∆E i (P i ) is depicted in Figure 7a according to where i denotes the i-th buffer group.  Of course, the above analysis can only be conducted when the interactions are pairwise, which is true here for the dual-MM simulations. However, because pairwise interactions are usually predominant (e.g., accounting for ~80% of total interaction energies in water [60]), the approximate linearity may still hold when models with higher-order many-body potentials are employed, albeit to a less extent, unless the two employed potentials differ significantly from each other. In reality, the potentials should agree with each other reasonably well in the buffer region, otherwise it is likely that at least one potential is very inaccurate and should not be used at all.
We note that while the forces by the correction term cancel the average transition forces, the cancellation is not exact at every step. Inexact transient cancellations lead to "residue" forces, whose effects on the simulation may or may not lead to erroneous solvation structures, which probably varies from case to case. While further investigations will be needed to explore this in the future, it is conceivable that narrow, symmetric distributions of the residue forces can help to minimize their effects, which is the case in this work. Therefore, it is prudent to match the potentials as closely as possible in the buffer zone. At this point, we note that Jiang et al. [61] are developing MM water models specifically designed for adaptive QM/MM simulations.
Overall, the results presented here demonstrate that the HPAP can both yield accurate solvation structures and conserve energy in NVE simulations. The progress thus fills a gap in the PAP algorithms. The successful applications of the HAMBC treatment to both SAP by Boreboom et al. [42] and PAP in this work suggest that other distance-based adaptive QM/MM schemes may also benefit from this treatment. We expect that the new HPAP algorithm will be useful to many applications where simulations of Hamiltonian systems are required. Future studies are especially encouraged to investigate the cases where multiple types of buffer groups (e.g., water and ions) are present and to explore the treatments for inhomogeneous systems (e.g., ion channels). The bonded energy terms are the dominant contributors to ∆E i . This is not surprising because of the similarity in the nonbonded interaction parameters (Table 1). Both the O-H bond and H-O-H angle energies are sizeable, but the O-H bond energy is overall more significant. The difference in the O-H bond energies between the two descriptions are where x is the instantaneous O-H bond length, and k 1 and k 2 are the force constants parameters and x 01 and x 02 the equilibrium geometric parameters of the two water models, respectively. If k 1 = k 2 = k, as it is for the two water models employed here, one has This means that the energy difference will vary linearly with respect to the instantaneous bond length. Figure 7b suggests smooth structural changes of the water molecule in the buffer zone between the TIP3P/Fs model at P i = 0 and the SPC/Fw model at P i = 1 and confirms that the average instantaneous O-H bond length in the simulations is indeed approximately linear with respect to P i . The situation is similar in the H-O-H angle energy (Figure 7c), where k 1 ≈ k 2 = k, and the linearity also roughly holds. As a result, ∆E(P i ) is approximately linear with respect to P i .
Of course, the above analysis can only be conducted when the interactions are pairwise, which is true here for the dual-MM simulations. However, because pairwise interactions are usually predominant (e.g., accounting for~80% of total interaction energies in water [60]), the approximate linearity may still hold when models with higher-order many-body potentials are employed, albeit to a less extent, unless the two employed potentials differ significantly from each other. In reality, the potentials should agree with each other reasonably well in the buffer region, otherwise it is likely that at least one potential is very inaccurate and should not be used at all.
We note that while the forces by the correction term cancel the average transition forces, the cancellation is not exact at every step. Inexact transient cancellations lead to "residue" forces, whose effects on the simulation may or may not lead to erroneous solvation structures, which probably varies from case to case. While further investigations will be needed to explore this in the future, it is conceivable that narrow, symmetric distributions of the residue forces can help to minimize their effects, which is the case in this work. Therefore, it is prudent to match the potentials as closely as possible in the buffer zone. At this point, we note that Jiang et al. [61] are developing MM water models specifically designed for adaptive QM/MM simulations.
Overall, the results presented here demonstrate that the HPAP can both yield accurate solvation structures and conserve energy in NVE simulations. The progress thus fills a gap in the PAP algorithms. The successful applications of the HAMBC treatment to both SAP by Boreboom et al. [42] and PAP in this work suggest that other distance-based adaptive QM/MM schemes may also benefit from this treatment. We expect that the new HPAP algorithm will be useful to many applications where simulations of Hamiltonian systems are required. Future studies are especially encouraged to investigate the cases where multiple types of buffer groups (e.g., water and ions) are present and to explore the treatments for inhomogeneous systems (e.g., ion channels).