Hands-On Session 4: Calculating Free Energy Surfaces Using QM/MM-Δ Machine Learning Potentials
Learning Objectives
Select appropriate QM region and reaction coordinate definitions for your mechanistic question
Prepare the parameters and topology file for QM/MM simulations
Identify differences in the mdin file for running QM/MM versus MM, and how to apply a ΔMLP
Benchmark the difference in speeds for running semi-empirical, ΔMLP, and DFT level QM/MM
Perform umbrella sampling of a 1-dimensional reaction with AMBER
Use the NDFES program to produce a free energy profile with error bars
Interpret the free energy profile in the context of reaction mechanism
Verify that the ΔMLP free energy profile matches the DFT level
Demonstrate the importance of phase space overlap in umbrella window spacing for generating robust free energy profiles
Relevant literature
Tutorial
Build the QM system - 1D methyl transfer in MTR1
The necessary input and subsequent outputs for this tutorial are availble in
Caution
Insert download link
Copy the directory to your working directory
cp -r /PATH2Files/HandsOnSession4 ./
There will be times when a calculation has been performed for you and you will do analysis. In these cases, where indicated, you will copy the outputs to your working directory and analyze them.
Load the amber module:
{{amberload}}
You will also periodically need to use xmgrace to plot your results. Load the xmgrace module:
{{xmgraceload}}
The motivation for using umbrella sampling is that, without restraints, a system undergoing dynamics will very rarely cross a reaction barrier. Even if it does, it will be very sparsely sampled and take much longer than we can afford to sample. Therefore, we apply a series of harmonic biasing potentials, also referred to as an umbrella potential, to force the system over a free energy barrier in the space of the selected reaction coordinate. In this way, we can thoroughly sample the breaking and formation of bonds, or other processes like a conformational change. In order to construct a potential of mean force, we must track the value of the reaction coordinate throughout our simulations so that we can later remove the impact of the bias and compute what the distribution of sampling would look like had we not applied the restraint. This tutorial will show you how to perform and analyze QM/MM simulations of a methyl transfer reaction in the MTR1 ribozyme using 1 reaction coordinate.
The active site is shown in Figure 1, and includes the methyl donor ligand (O6 methylated guanine), the nucleophillic target A63, and C10 protonated at the N3 position.
First, we must decide which atoms will be treated quantum mechanically (aka the QM region). The rest of the atoms will be treated with the molecular mechanical force field. In this case, we are simulating methyl transfer from O6 methylated guanine (O6mG) to the N1 of A63, so we want to include both of these residues in the QM region. The positively charged C10 residue is directly hydrogen bonding to the leaving group, so we will include it as well. In the next tutorial this residue will also be involved in the reaction, so we will proactively include it now. For A63 and C10, we will only include the nucleobases since the backbones are not directly involved in the reaction. Additionally, U45 is important for holding the ligand in the active site, but since it is not directly involved in the reaction it will remain in the MM region. Based on this selection, the total charge of the QM region is +1 because C10 is in a non-standard protonation state.
Based on this QM region selection, we need to make some modifications to the parameter and topology files. These include: * remove connection atom charges at the QM/MM interface * redistribute MM charges so that the QM region within each residues is integer * remove Ewald atoms from TIP4P-Ew MM waters to be included in the QM region * remove any dummy atoms if present * add Lennard Jones parameters to QM atoms that lack then in MM (ie H’s)
Navigate to the mm2qmmm directory.
cd mm2qmmm
ls
mm.parm7 mm.rst7 mm2qmmm.py
The equilibrated mm.parm7 and mm.rst7 files have been provied for you along with the script mm2qmmm.py. This script will convert the mm parameter and topology files for use with QM/MM. Display the scripts options:
python mm2qmmm.py -h
usage: mm2qmmm.py [-h] -p PARM -c CRD -q QMCHARGE -m QMMASK -o OPARM -r ORST
Converts MM parameter/restart files to versions appropriate for QM/MM.
It removes connection atom charges, and redistributes MM charges so
that the QM region within each residue is an integer. It removes
M-sites from QM waters and removes dummy particles used in pKa
calculations. It adds LJ parameters to QM atoms that lack LJ
parameters in the MM representation.
optional arguments:
-h, --help show this help message and exit
-p PARM, --parm PARM MM parameter file
-c CRD, --crd CRD MM restart file
-q QMCHARGE, --qmcharge QMCHARGE
QM charge
-m QMMASK, --qmmask QMMASK
QM mask based on the MM parameter file representation
-o OPARM, --oparm OPARM
QMMM output parameter file
-r ORST, --orst ORST QMMM output restart file
-R, --hmr (optional) if present, then repartition H-masses in the QM region and set the timestep to 2 fs
The QM mask corresponding to the bases of C9 and A62 and the O6mG cofactor is “:69|@272-282,290-291,1975-1988”, and the net QM charge is +1 due to the protonated C residue. We will also apply hydrogen mass repartitioning. This is required to safely run QM/MM with a 2 fs timestep. Run mm2qmmm.py:
python mm2qmmm.py -p mm.parm7 -c mm.rst7 -q 1 -m ":69|@272-282,290-291,1975-1988" -o qmmm.parm7 -r qmmm.rst7 --hmr
CHARGE GROUP: :9
CHARGE GROUP: :62
CHARGE GROUP: :69
BEFORE charge adjustments:
The net charge of grp :9 is -0.00000
The QM charge within grp :9 is 0.90650
The MM charge within grp :9 is -0.90650
AFTER charge adjustments:
The net charge of grp :9 is -0.00000
The QM charge within grp :9 is 1.00000
The MM charge within grp :9 is -1.00000
BEFORE charge adjustments:
The net charge of grp :62 is -1.00000
The QM charge within grp :62 is -0.12410
The MM charge within grp :62 is -0.87590
AFTER charge adjustments:
The net charge of grp :62 is -1.00000
The QM charge within grp :62 is 0.00000
The MM charge within grp :62 is -1.00000
BEFORE charge adjustments:
The net charge of grp :69 is -0.00200
The QM charge within grp :69 is -0.00200
The MM charge within grp :69 is 0.00000
AFTER charge adjustments:
The net charge of grp :69 is 0.00000
The QM charge within grp :69 is 0.00000
The MM charge within grp :69 is 0.00000
The MM charges of the whole QM region is 1.00000
The expected net QM charge is 1.00000
New LJ type for QM type N* using old parameters (rmin= 1.8240 eps= 0.1700): @272,278,290,1975,1978,1981,1984,1987
New LJ type for QM type C4 using old parameters (rmin= 1.9080 eps= 0.0860): @273,275,277,281,1976,1979,1980,1985,1988
New LJ type for QM type H4 using old parameters (rmin= 1.4090 eps= 0.0150): @274
New LJ type for QM type HA using old parameters (rmin= 1.4590 eps= 0.0150): @276
New LJ type for QM type H using old parameters (rmin= 0.6000 eps= 0.0157): @279,280,291,1982,1983
New LJ type for QM type O using old parameters (rmin= 1.6612 eps= 0.2100): @282
New LJ type for QM type H5 using old parameters (rmin= 1.3590 eps= 0.0150): @1977,1986
Adjusting nonbond pair: 7 ( N*) - 1 ( HO) (rmin= 1.8240 eps= 0.0771)
Adjusting nonbond pair: 8 ( C4) - 1 ( HO) (rmin= 1.9080 eps= 0.0549)
Adjusting nonbond pair: 9 ( H4) - 1 ( HO) (rmin= 1.4090 eps= 0.0229)
Adjusting nonbond pair: 10 ( HA) - 1 ( HO) (rmin= 1.4590 eps= 0.0229)
Adjusting nonbond pair: 11 ( H) - 1 ( HO) (rmin= 0.6000 eps= 0.0234)
Adjusting nonbond pair: 12 ( O) - 1 ( HO) (rmin= 1.6612 eps= 0.0857)
Adjusting nonbond pair: 14 ( H5) - 1 ( HO) (rmin= 1.3590 eps= 0.0229)
Adjusting nonbond pair: 15 ( os) - 1 ( HO) (rmin= 1.7713 eps= 0.0504)
Adjusting nonbond pair: 16 ( nc) - 1 ( HO) (rmin= 1.8993 eps= 0.0574)
Adjusting nonbond pair: 17 ( na) - 1 ( HO) (rmin= 1.7992 eps= 0.0845)
Adjusting nonbond pair: 18 ( nh) - 1 ( HO) (rmin= 1.7903 eps= 0.0867)
Adjusting nonbond pair: 19 ( ca) - 1 ( HO) (rmin= 1.8606 eps= 0.0588)
Adjusting nonbond pair: 20 ( c3) - 1 ( HO) (rmin= 1.9069 eps= 0.0614)
Adjusting nonbond pair: 21 ( h5) - 1 ( HO) (rmin= 1.3735 eps= 0.0237)
Adjusting nonbond pair: 22 ( h1) - 1 ( HO) (rmin= 1.3593 eps= 0.0270)
Adjusting nonbond pair: 23 ( hn) - 1 ( HO) (rmin= 0.6210 eps= 0.0187)
You should now have qmmm.parm7 and qmmm.rst7 files.
Preparing the inputs
Navigate back to the base directory and copy the parm7 file to the template directory. Then copy the rst7 file to the directory called start.
cd ..
cp mm2qmmm/qmmm.parm7 template
cp mm2qmmm/qmmm.rst7 start/reactant.rst7
You have been provided a template mdin file (template/short.mdin). This mdin file is valid for QM/MM simulations in general, and is not specific to umbrella sampling. Let’s take a look at short.mdin:
short
&cntrl
! IO =======================================
irest = 0 ! 0 = start, 1 = restart
ntx = 1 ! 1 = start, 5 = restart
ntxo = 1 ! read/write rst as formatted file
iwrap = 1 ! wrap crds to unit cell
ioutfm = 1 ! write mdcrd as netcdf
imin = 0
ntmin = 1
ntpr = 20
ntwr = 20
ntwx = 0
ntwf = 0
! DYNAMICS =================================
nstlim = 250
dt = 0.002 ! ps/step
ntb = 1 ! 1=NVT periodic, 2=NPT periodic, 0=no box
! TEMPERATURE ==============================
temp0 = 298 ! target temp
gamma_ln = 5.0 ! Langevin collision freq
ntt = 3 ! thermostat (3=Langevin)
! PRESSURE ================================
ntp = 0 ! 0=no scaling, 1=isotropic, 2=anisotropic
! SHAKE ====================================
ntc = 2 ! 1=no shake, 2=HX constrained, 3=all constrained
noshakemask = ":69|@272-282,290-291,1975-1988" ! do not shake these
ntf = 1 ! 1=cpt all bond E, 2=ignore HX bond E, 3=ignore all bond E
! MISC =====================================
cut = 10.0
ifqnt = 1
ig = -1
nmropt = 1
/
&ewald
dsum_tol = 1.e-6
/
&qmmm
qm_theory = 'DFTB3'
qmmask = ':69|@272-282,290-291,1975-1988'
qmcharge = 1
spin = 1
qmshake = 0
qm_ewald = 1
qmmm_switch = 1
scfconv = 1.e-10
verbosity = 0
tight_p_conv = 1
diag_routine = 0
pseudo_diag = 1
dftb_maxiter = 100
/
&wt type = 'DUMPFREQ', istep1 = 10, /
&wt type='END' /
DISANG=img.disang
DUMPAVE=img.dumpave
LISTOUT=POUT
LISTIN=POUT
The timestep (dt) is set to 0.002 ps, or 2 fs. We will only perform short simulations of 0.5 ps to start in order to equilibrate the windows, therfore nstlim is set to 250.
We will be using the DFTB3 semi-empirical Hamiltonian, and the total charge of the QM region is +1 due to the protonation of cytosine. When determining the total charge of a nucleic acid QM region, be sure to consider any negatively charged phosphates that may be included. The QM region is defined by the mask ‘:69|@272-282,290-291,1975-1988’ which includes the entire ligand (:69), the nucleobase of C10 (@272-282,290-291) and the nucleobase of A63 (@1975-1988).
The noshakemask contains the same mask as the QM region, thus we will not shake QM atoms as these bonds involving hydrogen may fluctuate outside of equilibrium MM values when bonds are made/broken. SHAKE constraints are inherently dependent on the MM force field, thus MM constraints should not be combined with our QM description.
Since we want to apply nmr restraints, nmropt is set to 1, and we will output restraint values to the dumpave file every 10 steps. The green highlighted img.disang and img.dumpave are place holders that will be replaced to indicate the appropriate umbrella window number in order to apply restraints.
Let’s review how nmr restraints work in AMBER. When nmropt=1 is set in the mdin file, restraints will be read from the specified DISANG file. These are different than coordinate restraints set when using ntr=1, which effectively freezes the atoms in place. Using nmr restraints allows the atom position to fluctuate subject to a specified biasing potential relating it to another atom or atoms. The values of the restrained properties for a given step are output to the specified DUMPAVE file with a frequency of istep1.
Next we need to select the reaction coordinates. We choose the reaction coordinate to be the difference in distance between the atoms where the bond will break, and between the atoms where the bond will form. This ensures the reaction coordinate value to increase going from reactant to product making it more intuitive. In MTR1, describing methyl transfer between O6mG:O6 and A63:N1 will yield the reaction coordinate (RC): RC = |RO6mG:O6 - RCm| - | RA63:N1 - RCm |. We will evaluate the reaction going from the reactant with RC at -2.5 Å, to product with RC at 2.5 Å. You have been provided template/img.disang containing the restraint definitions.
Take a look at img.disang:
#methylation: |GN-C5|-|A62-C5|
&rst iat= 2200, 2189, 2200, 1984,
r1=-999, r2=RC1, r3=RC1, r4=999, rk2=100, rk3=100, rstwt=1.0,-1.0 /
#Max distances for H-bonds
## 1 ##
&rst iat= 1403, 2207,
r1=-999, r2=-999, r3=3.00, r4=999,rk2=0, rk3=100 /
## 2 ##
&rst iat= 1405, 2192,
r1=-999, r2=-999, r3=2.20, r4=999,rk2=0, rk3=100 /
## 3 ##
&rst iat= 1407, 2206,
r1=-999, r2=-999, r3=2.00, r4=999,rk2=0, rk3=100 /
## 4 ##
&rst iat= 2205, 282,
r1=-999, r2=-999, r3=2.20, r4=999,rk2=0, rk3=100 /
## 5 ##
&rst iat= 280, 2189,
r1=-999, r2=-999, r3=2.20, r4=999,rk2=0, rk3=100 /
## 6 ##
&rst iat= 2190, 1982,
r2=-999, r2=-999, r3=3.00, r4=999,rk2=0, rk3=100 /
## 7 ##
&rst iat= 291, 2193,
r1=-999, r2=-999, r3=2.20, r4=999,rk2=0, rk3=100 /
For now, we need to understand more about the restraints used in this example. The first restraint defines the reaction coordinate. The line “&rst iat= 2200, 2189, 2200, 1984,” indicates that we are restraining the difference in distance between the methyl carbon (2200) to ligand oxygen (2189) and the methyl carbon (2200) to the N1 of adenine (1984). These distances are shown in Figure 1. The presence of “rstwt=1.0,-1.0” indicates this should be a difference of distances. In theory, the two distances could be seperate reaction coordinates, but combining them reduces the dimensionality while still describing the same event. The values of r1, r2, r3, and r4 define the shape of the biasing potential. The r2 and r3 values define the flat region of the potential. If r2 is less than r3, then there is no penalty for being within those bounds. When applying an umbrella potential, we set r2=r3 to sample directly on the desired value. The img.disang file is a template, where RC1 will ultimately be replaced for each umbrella window. If more reaction coordinates were present, the placeholder RC2 and so on would be used. The r1 and r4 values define the bounds of the linear response region, ie the penalty will progressively increase as the restrained property approaches r1 and r4. By setting r1=-999 and r4=999 we have effectively created a parabola with infinite bounds. The values rk2 and rk3 are the weights of the restraints. These effectively define the steepness of the parabola, thus controlling the penalty for going outside the r2 to r3 region. For umbrella sampling we want to set this to a high number to force the sampling of high energy regions. The higher the free energy at an umbrella center position, the more the observed value (reported in the dumpave file) will attempt to move away from this center and fight against the biasing potential. It is this concept that helps us construct a PMF.
The rest of the restraints do not define reaction coordinates, and they will be held constant in all of the umbrella windows. These are distance restraints on the active site hydrogen bonds to the ligand to ensure it stays in the pocket. These are different from the harmonic potential used for the reaction coordinate because r2=-999 and rk2=0. This defines a half-harmonic biasing potential where any distance less than 3.0 feels no penalty. If the values were reversed (ie. r3=999 and rk2=100), then the two atoms would be pushed away from each other. Now that we have a comprehensive understand of the nmr restraints, we will focus on setting up an umbrella sampling simulation.
Now we will generate the equil directory based on the inputs provided. This will be where we sequentially generate the umbrella windows starting from the reactant structure. It is important to slowly increment the reaction coordinate starting from a neighboring structure because each time you change the restraint, you are effectively changing the force field and risk large spikes in energy that could cause unintended bonds to break. We will use the ndfes program to write the inputs for this step.
You have been provided a reactant and product.disang file where the reaction coordinate is set to -2.5 and 2.5 respectively. These will define the end points, and a total of 32 umbrella windows will be generated to linearly interpolate through them.
Navigate to the start directory and run the following command.
cd start
ndfes-path-prepguess.py --disang ../template/img.disang --mdin ../template/short.mdin --min reactant.disang --min product.disang --nsim 32 --odir ../equil --pad 2 --dry-run
The –min flags indicate the minima we want to use as control points (reactant and product). The option –disang indicates the template disang file with a placeholder, –mdin indicates the template mdin file with a placeholder, –nsim determines the number of umbrella windows, and –pad means the file names will be padded with a leading zero. The –dry-run flag will just print the output without creating the simulation directory called equil. The output should look like this:
1 -2.500 ../equil/img01.disang
2 -2.339 ../equil/img02.disang
3 -2.177 ../equil/img03.disang
4 -2.016 ../equil/img04.disang
5 -1.855 ../equil/img05.disang
6 -1.694 ../equil/img06.disang
7 -1.532 ../equil/img07.disang
8 -1.371 ../equil/img08.disang
9 -1.210 ../equil/img09.disang
10 -1.048 ../equil/img10.disang
11 -0.887 ../equil/img11.disang
12 -0.726 ../equil/img12.disang
13 -0.565 ../equil/img13.disang
14 -0.403 ../equil/img14.disang
15 -0.242 ../equil/img15.disang
16 -0.081 ../equil/img16.disang
17 0.081 ../equil/img17.disang
18 0.242 ../equil/img18.disang
19 0.403 ../equil/img19.disang
20 0.565 ../equil/img20.disang
21 0.726 ../equil/img21.disang
22 0.887 ../equil/img22.disang
23 1.048 ../equil/img23.disang
24 1.210 ../equil/img24.disang
25 1.371 ../equil/img25.disang
26 1.532 ../equil/img26.disang
27 1.694 ../equil/img27.disang
28 1.855 ../equil/img28.disang
29 2.016 ../equil/img29.disang
30 2.177 ../equil/img30.disang
31 2.339 ../equil/img31.disang
32 2.500 ../equil/img32.disang
dmax 0.161
dmax is the spacing between umbrella windows. In general, if the spacing is too large (>0.2), consider adding more windows. For our purposes this is sufficiently small.
Run the command again without the –dry-run flag to create the equil directory:
ndfes-path-prepguess.py --disang ../template/img.disang --mdin ../template/short.mdin --min reactant.disang --min product.disang --nsim 32 --odir ../equil --pad 2
Equilibrating the umbrella windows
List the contents of the equil directory:
cd ../equil
ls
eq01fwd.sh img04.disang img07.mdin img11.disang img14.mdin img18.disang img21.mdin img25.disang img28.mdin img32.disang
img01.disang img04.mdin img08.disang img11.mdin img15.disang img18.mdin img22.disang img25.mdin img29.disang img32.mdin
img01.mdin img05.disang img08.mdin img12.disang img15.mdin img19.disang img22.mdin img26.disang img29.mdin init01.rst7
img02.disang img05.mdin img09.disang img12.mdin img16.disang img19.mdin img23.disang img26.mdin img30.disang
img02.mdin img06.disang img09.mdin img13.disang img16.mdin img20.disang img23.mdin img27.disang img30.mdin
img03.disang img06.mdin img10.disang img13.mdin img17.disang img20.mdin img24.disang img27.mdin img31.disang
img03.mdin img07.disang img10.mdin img14.disang img17.mdin img21.disang img24.mdin img28.disang img31.mdin
You will see that the eq01fwd.sh script was generated for you in the ouputs directory.
Take a look at eq01fwd.sh:
#!/bin/bash
set -e
set -u
#
# You can create a slurm script and run these
# commands in the following way:
#
# export LAUNCH="srun sander.MPI"
# export PARM="path/to/parm7"
# bash eq01fwd.sh
#
if [ "${LAUNCH}" == "" ]; then
echo 'bash variable LAUNCH is undefined. Defaulting to: export LAUNCH="sander"'
export LAUNCH="sander"
fi
if [ "${PARM}" == "" ]; then
echo 'bash variable PARM is undefined. Please: export PARM="/path/to/parm7"'
exit 1
else
if [ ! -e "${PARM}" ]; then
echo "File not found: ${PARM}"
exit 1
fi
fi
if [ ! -e "init01.rst7" ]; then
echo "File not found: init01.rst7"
exit 1
fi
${LAUNCH} -O -p ${PARM} -i img01.mdin -o img01.mdout -c init01.rst7 -r img01.rst7 -x img01.nc -inf img01.mdinfo
${LAUNCH} -O -p ${PARM} -i img02.mdin -o img02.mdout -c img01.rst7 -r img02.rst7 -x img02.nc -inf img02.mdinfo
${LAUNCH} -O -p ${PARM} -i img03.mdin -o img03.mdout -c img02.rst7 -r img03.rst7 -x img03.nc -inf img03.mdinfo
${LAUNCH} -O -p ${PARM} -i img04.mdin -o img04.mdout -c img03.rst7 -r img04.rst7 -x img04.nc -inf img04.mdinfo
${LAUNCH} -O -p ${PARM} -i img05.mdin -o img05.mdout -c img04.rst7 -r img05.rst7 -x img05.nc -inf img05.mdinfo
${LAUNCH} -O -p ${PARM} -i img06.mdin -o img06.mdout -c img05.rst7 -r img06.rst7 -x img06.nc -inf img06.mdinfo
${LAUNCH} -O -p ${PARM} -i img07.mdin -o img07.mdout -c img06.rst7 -r img07.rst7 -x img07.nc -inf img07.mdinfo
${LAUNCH} -O -p ${PARM} -i img08.mdin -o img08.mdout -c img07.rst7 -r img08.rst7 -x img08.nc -inf img08.mdinfo
${LAUNCH} -O -p ${PARM} -i img09.mdin -o img09.mdout -c img08.rst7 -r img09.rst7 -x img09.nc -inf img09.mdinfo
${LAUNCH} -O -p ${PARM} -i img10.mdin -o img10.mdout -c img09.rst7 -r img10.rst7 -x img10.nc -inf img10.mdinfo
${LAUNCH} -O -p ${PARM} -i img11.mdin -o img11.mdout -c img10.rst7 -r img11.rst7 -x img11.nc -inf img11.mdinfo
${LAUNCH} -O -p ${PARM} -i img12.mdin -o img12.mdout -c img11.rst7 -r img12.rst7 -x img12.nc -inf img12.mdinfo
${LAUNCH} -O -p ${PARM} -i img13.mdin -o img13.mdout -c img12.rst7 -r img13.rst7 -x img13.nc -inf img13.mdinfo
${LAUNCH} -O -p ${PARM} -i img14.mdin -o img14.mdout -c img13.rst7 -r img14.rst7 -x img14.nc -inf img14.mdinfo
${LAUNCH} -O -p ${PARM} -i img15.mdin -o img15.mdout -c img14.rst7 -r img15.rst7 -x img15.nc -inf img15.mdinfo
${LAUNCH} -O -p ${PARM} -i img16.mdin -o img16.mdout -c img15.rst7 -r img16.rst7 -x img16.nc -inf img16.mdinfo
${LAUNCH} -O -p ${PARM} -i img17.mdin -o img17.mdout -c img16.rst7 -r img17.rst7 -x img17.nc -inf img17.mdinfo
${LAUNCH} -O -p ${PARM} -i img18.mdin -o img18.mdout -c img17.rst7 -r img18.rst7 -x img18.nc -inf img18.mdinfo
${LAUNCH} -O -p ${PARM} -i img19.mdin -o img19.mdout -c img18.rst7 -r img19.rst7 -x img19.nc -inf img19.mdinfo
${LAUNCH} -O -p ${PARM} -i img20.mdin -o img20.mdout -c img19.rst7 -r img20.rst7 -x img20.nc -inf img20.mdinfo
${LAUNCH} -O -p ${PARM} -i img21.mdin -o img21.mdout -c img20.rst7 -r img21.rst7 -x img21.nc -inf img21.mdinfo
${LAUNCH} -O -p ${PARM} -i img22.mdin -o img22.mdout -c img21.rst7 -r img22.rst7 -x img22.nc -inf img22.mdinfo
${LAUNCH} -O -p ${PARM} -i img23.mdin -o img23.mdout -c img22.rst7 -r img23.rst7 -x img23.nc -inf img23.mdinfo
${LAUNCH} -O -p ${PARM} -i img24.mdin -o img24.mdout -c img23.rst7 -r img24.rst7 -x img24.nc -inf img24.mdinfo
${LAUNCH} -O -p ${PARM} -i img25.mdin -o img25.mdout -c img24.rst7 -r img25.rst7 -x img25.nc -inf img25.mdinfo
${LAUNCH} -O -p ${PARM} -i img26.mdin -o img26.mdout -c img25.rst7 -r img26.rst7 -x img26.nc -inf img26.mdinfo
${LAUNCH} -O -p ${PARM} -i img27.mdin -o img27.mdout -c img26.rst7 -r img27.rst7 -x img27.nc -inf img27.mdinfo
${LAUNCH} -O -p ${PARM} -i img28.mdin -o img28.mdout -c img27.rst7 -r img28.rst7 -x img28.nc -inf img28.mdinfo
${LAUNCH} -O -p ${PARM} -i img29.mdin -o img29.mdout -c img28.rst7 -r img29.rst7 -x img29.nc -inf img29.mdinfo
${LAUNCH} -O -p ${PARM} -i img30.mdin -o img30.mdout -c img29.rst7 -r img30.rst7 -x img30.nc -inf img30.mdinfo
${LAUNCH} -O -p ${PARM} -i img31.mdin -o img31.mdout -c img30.rst7 -r img31.rst7 -x img31.nc -inf img31.mdinfo
${LAUNCH} -O -p ${PARM} -i img32.mdin -o img32.mdout -c img31.rst7 -r img32.rst7 -x img32.nc -inf img32.mdinfo
This script would start from img01 and iteratively generate the umbrella windows using the output of the previous window as the starting structure. It is important to increment the starting structure slowly to avoid errors. The serial nature of this process makes it too slow to compute during the course of this workshop, so these simulations have been conducted for you.
Running umbrella sampling
After generating the windows, it it best practice to exclude the first ~2 ps of production sampling from analysis to allow for proper equilibration time. The equilibration region can be checked using ndfes-CheckEquil.py. However, under the time constraints of the exercise, you will not be able to perform long enough sampling to gather equilibrium statistics, so you will only perform enough sampling to generate a PMF.
Now we will perform umbrella sampling using DFTB3 on all of the umbrella windows in parallel. For the sake of of this exercise, the windows have been divided into 8 groups of 4 in directories called t1 through t8. Normally, one would run all of the windows together, but this will allow each participant to perform 3 ps of sampling in ~8 minutes. This tutorial will focus on t1 as an example, but any of the 8 could be selected.
Navigate to the DFTB3 directory and list the contents of the t1 directory:
cd DFTB3
ls t1
analyze_1-4.sh img01.mdin img02.mdin img03.mdin img04.mdin init02.rst7 init04.rst7 template.groupfile
img01.disang img02.disang img03.disang img04.disang init01.rst7 init03.rst7 run_1-4.slurm
The init{01..04} restart files are copies of the final img{01..04} restart files generated in the equil directory. In the mdin files, nstlim has been increased to 1500 so that with a 2 fs timestep we will perform 3 ps of sampling. It is typically desirable to run longer simulations when more time is available; however, this will be enough sampling to generate and analyze a PMF as a demonstration.
You have also been provided a slurm script called run_1-4.slurm.
Take a look at run_1-4.slurm:
#!/bin/bash
#SBATCH --job-name="sim.1-4"
#SBATCH --output="%a.slurmout"
#SBATCH --error="%a.slurmerr"
#SBATCH --partition={{CPUpartition}}
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=32
#SBATCH --mem=64GB
#SBATCH --cpus-per-task=1
#SBATCH --requeue
#SBATCH --export=ALL
#SBATCH -t 0-00:30:00
#SBATCH --account={{account}}
{{amberload}}
export LAUNCH="srun --mpi=pmi2 -K1 -N1 -n32 -c1 --exclusive sander.MPI"
set -e
set -u
$LAUNCH -ng 4 -groupfile template.groupfile
wait
sleep 1
Each window will be allotted 8 cores. Now that the jobs are being run in parallel, we will provide a groupfile as input to sander.MPI.
Take a look at the groupfile:
-O -p ../../template/qmmm.parm7 -c init01.rst7 -i img01.mdin -o img01.mdout -r img01.rst7 -x img01.nc -inf img01.mdinfo
-O -p ../../template/qmmm.parm7 -c init02.rst7 -i img02.mdin -o img02.mdout -r img02.rst7 -x img02.nc -inf img02.mdinfo
-O -p ../../template/qmmm.parm7 -c init03.rst7 -i img03.mdin -o img03.mdout -r img03.rst7 -x img03.nc -inf img03.mdinfo
-O -p ../../template/qmmm.parm7 -c init04.rst7 -i img04.mdin -o img04.mdout -r img04.rst7 -x img04.nc -inf img04.mdinfo
This is similar to executing multiple backgrounded (&) sander.MPI processes, but it also gives us the capability to run replica exchange if desired.
Navigate to your trial directory and submit the job:
cd t1
sbatch run_1-4.slurm
Applying a Δ machine learning potential
Now we will re-run the 4 window simulation applying a Δ machine learning potential (ΔMLP). This is a potential trained to correct a base model to a target level of theory. In this case the ΔMLP is trained to correct the DFTB3 potential to replicate the energies and forces obtained at the PBE0/6-31G* level. Specifically, the potential is a model fine-tuned for this reaction from a foundatinal MACE based ΔMLP for nucleic acid enzyme reactions. This process will be the suject of a later tutorial, and for now we will just appy the model.
In order to apply the ΔMLP, you will need a file containing the model parameters. In the template directory you have been provided a file called graph.000.pth. This is a python pickled object that contains the trained model parameters. The .pth extension means this is a model trained with PyTorch. You have also been provided a template mdin file called long_mlp.mdin. Take a look at this file:
MACE
&cntrl
! IO =======================================
irest = 0 ! 0 = start, 1 = restart
ntx = 1 ! 1 = start, 5 = restart
ntxo = 1 ! read/write rst as formatted file
iwrap = 1 ! wrap crds to unit cell
ioutfm = 1 ! write mdcrd as netcdf
imin = 0
ntmin = 1
ntpr = 20
ntwr = 20
ntwx = 0
ntwf = 0
! DYNAMICS =================================
nstlim = 1500
dt = 0.002 ! ps/step
ntb = 1 ! 1=NVT periodic, 2=NPT periodic, 0=no box
! TEMPERATURE ==============================
temp0 = 298 ! target temp
gamma_ln = 5.0 ! Langevin collision freq
ntt = 3 ! thermostat (3=Langevin)
! PRESSURE ================================
ntp = 0 ! 0=no scaling, 1=isotropic, 2=anisotropic
! SHAKE ====================================
ntc = 2 ! 1=no shake, 2=HX constrained, 3=all constrained
noshakemask = ":69|@272-282,290-291,1975-1988" ! do not shake these
ntf = 1 ! 1=cpt all bond E, 2=ignore HX bond E, 3=ignore all bond E
! MISC =====================================
cut = 10.0
ifqnt = 1
ig = -1
nmropt = 1
/
&ewald
dsum_tol = 1.e-6
/
&ewald
dsum_tol = 1.e-6
/
&qmmm
qm_theory = 'DFTB3'
qmmask = ':69|@272-282,290-291,1975-1988'
qmcharge = 1
spin = 1
qmshake = 0
qm_ewald = 1
qmmm_switch = 1
scfconv = 1.e-10
verbosity = 0
tight_p_conv = 1
diag_routine = 0
pseudo_diag = 1
dftb_maxiter = 100
/
&wt type = 'DUMPFREQ', istep1 = 5, /
&wt type='END' /
DISANG=img.disang
DUMPAVE=img.dumpave
LISTOUT=POUT
LISTIN=POUT
&dprc
idprc=1
mask=":69|@272-282,290-291,1975-1988"
rcut = 6.0
intrafile=""
interfile(1)="../../template/graph.000.pth"
/
The input files are the same except for the additional dprc namelist. Because the base model is DFTB3, we must still specify DFTB3 as the QM theory. A ΔMLP should not be applied on top of a QM method it was not trained to correct. DPRC stands for Deep Potential Range Correction, meaning the correction is applied for QM-QM interactions and QM-MM interactions within an rcut radius of 6 angstroms, where the correction smoothly vanishes. The mask should match the qmmask, and interfile should point to the model file. The dprc correction is considered an “interfile” because is corrects QM-QM and QM-MM interactions, whereas an “intrafile” would be a model only correcting the internal QM potential. We leave this option blank. Multiple interfiles can be specified, in which case the deviation between the models is reported in the mdout. This is useful when training models with active learning, but for this application we will only use 1 model file.
Navigate to the dMLP directory and take a look at the run script for this set of 4 simulations:
cd dMLP/t1
cat run_1-4_MACE.slurm
#!/bin/bash
#SBATCH --job-name="sim.1-4.MACE"
#SBATCH --output="%a.slurmout"
#SBATCH --error="%a.slurmerr"
#SBATCH --partition={{GPUpartition}}
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-gpu=4
#SBATCH --mem=256G
#SBATCH --export=ALL
#SBATCH -t 0-00:30:00
#SBATCH --account={{account}}
{{amberload}}
export LAUNCH="srun --mpi=pmi2 -K1 -N1 -n4 -c1 --exclusive sander.MPI"
set -e
set -u
$LAUNCH -ng 4 -groupfile template.groupfile
wait
sleep 1
The ΔMLP inference is accelerate by GPUs, so we must request a GPU node. We will use 1 sander task per window because the underlying DFTB3 simulation is still performed by the CPU, and there are less CPUs present within the GPU node than a large CPU node.
Submit the job:
sbatch run_1-4_MACE.slurm
Using an ab initio DFT potential with QUICK
Now we will use QUICK to perform a short simulation for a single window with the PBE0 Hamiltonian and 6-31G* basis set. Navigate to the PBE0 directory where you have been provided the required inputs:
cd PBE0
ls
img01.disang img01.mdin init01.rst7 run_PBE0.slurm
For PBE0 we will only run a single window for 2 steps to get timing info because ab initio DFT is more expensive than semi-empirical or ΔMLP level simulations. Let’s take a look at img01.mdin:
PBE0
&cntrl
! IO =======================================
irest = 0 ! 0 = start, 1 = restart
ntx = 1 ! 1 = start, 5 = restart
ntxo = 1 ! read/write rst as formatted file
iwrap = 1 ! wrap crds to unit cell
ioutfm = 1 ! write mdcrd as netcdf
imin = 0
ntmin = 1
ntpr = 1
ntwr = 1
ntwx = 1
ntwf = 0
! DYNAMICS =================================
nstlim = 1 ! number of time steps
dt = 0.002 ! ps/step
ntb = 1 ! 1=NVT periodic, 2=NPT periodic, 0=no box
! TEMPERATURE ==============================
temp0 = 298 ! target temp
gamma_ln = 5.0 ! Langevin collision freq
ntt = 3 ! thermostat (3=Langevin)
! PRESSURE ================================
ntp = 0 ! 0=no scaling, 1=isotropic, 2=anisotropic
! SHAKE ====================================
ntc = 2 ! 1=no shake, 2=HX constrained, 3=all constrained
noshakemask = ":69|@272-282,290-291,1975-1988" ! do not shake these
ntf = 1 ! 1=cpt all bond E, 2=ignore HX bond E, 3=ignore all bond E
! MISC =====================================
cut = 10.0
ifqnt = 1
ig = -1
nmropt = 1
/
&wt
type='DUMPFREQ', istep1=1
&end
&wt
type='END',
&end
DISANG=img01.disang
DUMPAVE=img01.dumpave
&ewald
dsum_tol = 1.e-6
/
&qmmm
qm_theory = 'quick'
qmmask = ':69|@272-282,290-291,1975-1988'
qmcharge = 1
spin = 1
qmmm_int = 1
qm_ewald = 1
qmshake = 0
itrmax = 50
scfconv = 1e-07
verbosity = 0
/
&quick
method = 'PBE0'
basis = '6-31G*'
/
Now we get the qm_theory to quick, which points to the &quick namelist where we specify the method and basis set. Now take a look at run_PBE0.slurm:
#!/bin/bash
#SBATCH --job-name="sim.PBE0"
#SBATCH --output="PBE0.slurmout"
#SBATCH --error="PBE0.slurmerr"
#SBATCH --partition=gpu-debug
#SBATCH --gpus=1
#SBATCH --nodes=1
#SBATCH --ntasks-per-gpu=1
#SBATCH --mem=16G
#SBATCH --cpus-per-task=1
#SBATCH --export=ALL
#SBATCH -t 0-00:30:00
#SBATCH --account=rut149
{{amberload}}
top=`pwd`
export LAUNCH="srun --mpi=pmi2 -n1 --exclusive"
export EXE="sander.MPI"
parm=../template/qmmm.parm7
time ${LAUNCH} ${EXE} -O -p ${parm} -c init01.rst7 -i img01.mdin -o img01.mdout -r img01.rst7 -x img01.nc -inf img01.mdinfo
QUICK is a GPU accelerated QM program, therefore we will request a GPU node, and 4 parallel tasks. We will also time the job to get a sense of the speed. Submit the job, which should finish within 20 minutes. During that time, feel free do the analysis in the next section while it runs.
sbatch run_PBE0.slurm
When the job is done, take a look at the timing:
Generating free energy profiles with NDFES
Now we will analyze the free energy profiles generated from the DFTB3 and ΔMLP simulations using the N-Dimensional Free Energy Surface (NDFES) program, which is available in AMBER tools. We will combine all of the umbrella sampling trials to construct a combined PMF. You have been provided the output of all of the trials that must be copied to the working directory.
Copy the remaining trial directories to your working directory.
Create a directory called combined_analysis.
List the contents of your working directory. It should be structured as follows:
cp -r INSERTPATH/QMMM_DFTB3_MTR1/1D/ouputs/t{2..8} ./
mkdir combined_analysis
ls
combined_analysis t1 t2 t3 t4 t5 t6 t7 t8
The structure files have been removed from the output directories for the sake of file size, but these would have been generated by the simulations and are not required for analysis. We want to combine the metafiles from all of the trial directories.
Run the following command to generate metafile.all:
cd combined_analysis
ndfes-CombineMetafiles.py -o metafile.all ../t{1..8}/analysis/metafile
Run the following command to generate metafile.all.chk:
ndfes_omp --mbar -w 0.15 --nboot 0 -c metafile.all.chk metafile.all
This will generate metafile.all.chk. Finally, we will generate the path file for the entire PMF.
Run the following command to generate the path file:
ndfes-path_omp --chk metafile.all.chk --ipath metafile.all --npathpts 32 --nsplpts 400 --opath path_all
The ndfes commands are also given in combined_analysis.sh, but for the sake of understanding the role of each program they have been run one at a time.
Plot the PMF with xmgrace:
xmgrace -block path_all.rbf.0.dat -bxy 3:4
Figure 3. PMF as a function of reaction coordinate value from simulation of all 32 umbrella windows.
Now we will repeat the analysis for the ΔMLP simulations to compare the our DFTB3 result.
cd ../dMLP
cp -r INSERTPATH/QMMM_dMLP_MTR1/1D/ouputs/t{2..8} ./
mkdir combined_analysis
cd combined_analysis
ndfes-CombineMetafiles.py -o metafile.all ../t{1..8}/analysis/metafile
ndfes_omp --mbar -w 0.15 --nboot 0 -c metafile.all.chk metafile.all
ndfes-path_omp --chk metafile.all.chk --ipath metafile.all --npathpts 32 --nsplpts 400 --opath path_all
xmgrace -block path_all.rbf.0.dat -bxy 3:4 -block ../../DFTB3/combined_analysis/path_all.rbf.0.dat
Warning
INSERT XMGRACE IMAGE Here Say some stuff about how they don’t match…hopefully
Comparison with DFT
Now let’s look back at the PBE0 run to get a sense of the speed-up afforded by the ΔMLP model. Navigate to the PBE0 directory, and show the timing info at the bottom of the slurmerr file:
cd ../../PBE0
tail -n5 PBE0.slurmerr
Now we can see the full PMF. From this we see that the free energy barrier for this reaction is ~30 kcal/mol, which is quite high. In 1 dimension it is very clear what the path is from reactants to products, and we can plot the free energy as a function of a single coordinate. However, if more reaction coordinates are involved, the path from reactant to product is less clear. In the following sections you will see how the use of more reaction coordinates can better model this reaction.