QM/MM Umbrella sampling using path collective variables
================================================================================
| Timothy J. Giese :sup:`1`, and Darrin M.
York\ :sup:`1`
| :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute
for Quantitative Biomedicine and Department of Chemistry and Chemical
Biology, Rutgers University, Piscataway, NJ 08854, USA
Learning objectives
-------------------
- Use colvars restraints within Sander
- Convert Amber NMR Restraint DISANG file to colvars CONFIG format
- Convert ndfes Metafile to a colvars CONFIG format that biases the the surface using Path Collective Variables
- Run QM/MM PathCV umbrella sampling using colvars implementation within Sander
- Analyze the PathCV umbrella sampling using ndfes
Relevant literature
-------------------
- `Collective variables library for molecular simulation and analysis programs `__
- `From a to b in free energy space `__
- `Defining an optimal metric for the path collective variables `__
Tutorial
--------
In this Tutorial, you will learn how to activate and run restrained dynamics in Sander using the colvars library.
.. contents::
:local:
:depth: 4
.. start-tutorial
Introduction
------------
The colvars library implements many novel collective variable definitions, including `arithmetic path collective variables `__
"Arithmetic" path collective variables express a sequence of primitive collective variables defining a pathway into 2 variables: s and v, where "s" is the progress along the path and "v" describes the direction perpendicular to the path. The colvars config file format defines this type of collective variable as
`aspathcv `__
and
`azpathcv `__
Path collective variables are useful to resample along a low-level minimum free energy path using a high-level Hamiltonian.
Use colvars restraints within Sander
------------------------------------
The official version of AmberTools does not have the colvars library.
It is only available within internally modified York Group versions of AmberTools.
There are 2 versions available on Amarel.
.. code-block:: bash
[user@amarel] module use /projects/community/modulefiles
[user@amarel] module use /projectsp/f_lbsr/YorkGroup/software/modulefiles
[user@amarel] module use /projectsp/f_lbsr/YorkGroup/software/23jun25/modulefiles
#
# Now load the version of AmberTools with colvars
#
[user@amarel] module load 23jun25/ambertools/libcolvars
#
# It is also available in the HFDF version of AmberTools, but
# this should only be used with ab initio qm/mm simulations
#
[user@amarel] module load 23jun25/ambertools/hfdf
Whereas the standard NMR restraint functionality is specified in the mdin file by setting nmropt=1 in the &cntrl section, and setting the DISANG and DUMPAVE variables, the colvars functionality is specified in a separate namelist, shown below.
.. code-block::
Title: Umbrella sampling example
&cntrl
[...simulation options...]
!
! Do not activate NMR restraints
!
nmropt = 0
/
&colvars
icolvars=1
output="img.cv.dumpave"
configfile="img.cv.disang"
/
- icolvars: integer, default=0; Activates colvar restraints if set to 1
- configfile: string, default=""; File that defines the variables and restraints
- output: string, default=""; The output file containing the observed collective variable values.
Convert Amber NMR Restraint DISANG file to colvars CONFIG format
----------------------------------------------------------------
One often applies harmonic and half harmonic restraints to
simple, primitive collective variables, such as
bond lengths, angles, dihedrals, and R12 coordinates.
In this case, I have prepared a ndfes script to transform
NMR restraint DISANG files into the colvars CONFIG file format.
This script is not available within the main branch of
FE-Toolkit; instead one needs to checkout and install the
FE-Toolkit "devel" branch. The colvars-related scripts
are not installed into the ${prefix}/bin/ directory;
instead they are located within ndfes/examples/colvars/.
.. code-block::
python3 ${FETKHOME}/ndfes/examples/colvars/ndfes-disang2colvars.py --help
usage: ndfes-disang2colvars.py [-h] -i INP -o OUT [--printFreq PRINTFREQ] [--version]
Convert amber nmr disang file to colvars format
options:
-h, --help show this help message and exit
-i INP, --inp INP The input amber disang file
-o OUT, --out OUT The output colvars disang file
--printFreq PRINTFREQ
Collective variable print frequency, default=1
--version show program's version number and exit
The --printFreq option will set the value of colvarsTrajFrequency within
the output CONFIG file. This value replaces the DUMPFREQ option within the mdin file.
Convert ndfes Metafile to a colvars CONFIG format that biases the the surface using Path Collective Variables
-------------------------------------------------------------------------------------------------------------
The ndfes-ndpath2pathcv.py script will read a metafile and write a series of colvars CONFIG files.
.. code-block::
python3 ${FETKHOME}/ndfes/examples/colvars/ndfes-ndpath2pathcv.py --help
usage: ndfes-ndpath2pathcv.py [-h] -d DISANG -i INP [-P PREFIX] [-s SFC] [-z ZFC] [--zscale ZSCALE] [--ninterp NINTERP] [--nimg NIMG] [--lam LAM] [--pad PAD] [--printFreq PRINTFREQ] [--version]
options:
-h, --help show this help message and exit
-d DISANG, --disang DISANG
The input template disang file
-i INP, --inp INP The input path. Either path-xml, metafile, or a txt file
-P PREFIX, --prefix PREFIX
The prefix for all output files. Default: img
-s SFC, --sfc SFC The force constant for the s-variable. If none, then estimated. Default: None
-z ZFC, --zfc ZFC The force constant for the z-variable. If none, then estimated. Default: None
--zscale ZSCALE If the force constants are estimated, then scale the z-coordinate force constant by zscale. Default: 1.
--ninterp NINTERP The number of points used to define the path. If none, then the input points are used. Default: None
--nimg NIMG The number of simulation images. If none, then the size is the same as the number of points in the input path. Default: None
--lam LAM The value of lambda. If none, then set to (nimg-1)^2. Default: None
--pad PAD Increase the path length on each end by linearly extending the endpoints. Instead of s=[0,1], the extended path goes from [pad,1-pad]. Default: 0.
--printFreq PRINTFREQ
Collective variable print frequency, default=1
--version show program's version number and exit
- The simulation will likely require a 1 fs timestep to avoid instabilities.
- Although the --inp option can support multiple file format, I recommend using a metafile. The metafile should contain NIMG states that uniformly discretize a path.
- If the "s" and "z" collective variables are not explicitly provided with the --sfc and --zfc options, then the script will estimate appropriate force constants based on the force constants provided within the metafile.
- In terms of "best practices", it is unclear what force constant (if any) should be used in the z-direction. The estimated force constant is likely only reasonable for the s-direction. One can use the --zscale option so the z-direction force constant is the scaled value of the s-direction. I have tried --zscale=0.25; however, even that may be too large.
- The path collective variable definition depends on parameter called "lambda". If the --lam value is not explictly provided, then its value is calculated as the inverse of mean square distance between consecutive images.8
- The domain of the s-coordinate is [0,1]; therefore, the path's endstates may not be well-sampled and you won't necessarily observe a "minimum" at the endstates. The --pad option allows you to linearly extend the path.
Example:
.. code-block::
[user@cluster] ls
finalpath.metafile MakeMdin.sh template.disang template.mdin
The final path has 32 images
.. code-block::
[user@cluster] cat finalpath.metafile
0 298.00 mbar/dftb3/dumpaves/t001/img001.dumpave -0.895238 1.00000000000000e+02 -1.709650 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img002.dumpave -0.751972 1.00000000000000e+02 -1.786090 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img003.dumpave -0.586146 1.00000000000000e+02 -1.792105 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img004.dumpave -0.420187 1.00000000000000e+02 -1.795297 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img005.dumpave -0.254189 1.00000000000000e+02 -1.795170 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img006.dumpave -0.088197 1.00000000000000e+02 -1.793741 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img007.dumpave 0.077782 1.00000000000000e+02 -1.791245 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img008.dumpave 0.243769 1.00000000000000e+02 -1.789650 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img009.dumpave 0.409763 1.00000000000000e+02 -1.790691 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img010.dumpave 0.575704 1.00000000000000e+02 -1.786797 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img011.dumpave 0.741337 1.00000000000000e+02 -1.775957 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img012.dumpave 0.896588 1.00000000000000e+02 -1.734241 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img013.dumpave 0.931105 1.00000000000000e+02 -1.575666 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img014.dumpave 0.933179 1.00000000000000e+02 -1.409691 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img015.dumpave 0.935337 1.00000000000000e+02 -1.243708 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img016.dumpave 0.940270 1.00000000000000e+02 -1.077785 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img017.dumpave 0.944688 1.00000000000000e+02 -0.911847 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img018.dumpave 0.946230 1.00000000000000e+02 -0.745857 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img019.dumpave 0.947346 1.00000000000000e+02 -0.579864 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img020.dumpave 0.948720 1.00000000000000e+02 -0.413872 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img021.dumpave 0.947236 1.00000000000000e+02 -0.247883 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img022.dumpave 0.945023 1.00000000000000e+02 -0.081900 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img023.dumpave 0.944293 1.00000000000000e+02 0.084097 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img024.dumpave 0.943048 1.00000000000000e+02 0.250090 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img025.dumpave 0.940888 1.00000000000000e+02 0.416074 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img026.dumpave 0.939383 1.00000000000000e+02 0.582064 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img027.dumpave 0.939631 1.00000000000000e+02 0.748062 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img028.dumpave 0.939564 1.00000000000000e+02 0.914060 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img029.dumpave 0.944007 1.00000000000000e+02 1.079977 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img030.dumpave 0.953367 1.00000000000000e+02 1.245709 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img031.dumpave 0.962581 1.00000000000000e+02 1.411439 1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img032.dumpave 0.980713 1.00000000000000e+02 1.576434 1.00000000000000e+02
The script figures out the dimensionality from the metafile, and it then assumes the first NDIM reaction coordinates in the disang file are the used to define the path. The remaining restraints are added to the config file without modification.
.. code-block::
[user@computer] cat template.disang
&rst
iat = 291, 290, 291, 2193
rstwt = 1.00, -1.00
r2 = RC1, r3 = RC1
r1 = -999.0000000000000, r4 = 999.0000000000000
rk2 = 100.0000000000000, rk3 = 100.0000000000000
&end
&rst
iat = 2200, 2189, 2200, 1984
rstwt = 1.00, -1.00
r2 = RC2, r3 = RC2
r1 = -999.0000000000000, r4 = 999.0000000000000
rk2 = 100.0000000000000, rk3 = 100.0000000000000
&end
&rst
iat = 1403, 2207
r2 = -999.0000000000000, r3 = 3.0000000000000
r1 = -999.0000000000000, r4 = 999.0000000000000
rk2 = 0.0000000000000, rk3 = 100.0000000000000
&end
&rst
iat = 1405, 2192
r2 = -999.0000000000000, r3 = 2.2000000000000
r1 = -999.0000000000000, r4 = 999.0000000000000
rk2 = 0.0000000000000, rk3 = 100.0000000000000
&end
&rst
iat = 1407, 2206
r2 = -999.0000000000000, r3 = 2.0000000000000
r1 = -999.0000000000000, r4 = 999.0000000000000
rk2 = 0.0000000000000, rk3 = 100.0000000000000
&end
&rst
iat = 2205, 282
r2 = -999.0000000000000, r3 = 2.2000000000000
r1 = -999.0000000000000, r4 = 999.0000000000000
rk2 = 0.0000000000000, rk3 = 100.0000000000000
&end
&rst
iat = 280, 2189
r2 = -999.0000000000000, r3 = 2.2000000000000
r1 = -999.0000000000000, r4 = 999.0000000000000
rk2 = 0.0000000000000, rk3 = 100.0000000000000
&end
&rst
iat = 2190, 1982
r2 = -999.0000000000000, r3 = 3.0000000000000
r1 = -999.0000000000000, r4 = 999.0000000000000
rk2 = 0.0000000000000, rk3 = 100.0000000000000
&end
The mdin file has a &colvars namelist and nmropt=0
.. code-block::
[user@cluster] cat template.mdin
title
&cntrl
! IO =======================================
irest = 1 ! 0 = start, 1 = restart
ntx = 5 ! 1 = start, 5 = restart
ntxo = 1 ! read/write rst as formatted file
iwrap = 1 ! wrap crds to unit cell
ioutfm = 1 ! write mdcrd as netcdf
imin = 0
ntmin = 1
ntpr = 10
ntwr = 10000
ntwx = 0
ntwv = 0
ntwf = 0
! DYNAMICS =================================
nstlim = 25000
!
! Use a 1 fs timestep, not 2 fs
!
dt = 0.001 ! ps/step
ntb = 1 ! 1=NVT periodic, 2=NPT periodic, 0=no box
! TEMPERATURE ==============================
temp0 = 298 ! target temp
!
! The HFDF code has to use ntt=3 via the leapfrog integrator
!
! gamma_ln = 5.0 ! Langevin collision freq
! ntt = 3 ! thermostat (3=Langevin)
!
! The libcovars version of AmberTools can also use the
! "middle scheme" integrator
!
ntt = 0
ischeme = 1
ithermostat = 1
therm_par = 5.0
! PRESSURE ================================
ntp = 0 ! 0=no scaling, 1=isotropic, 2=anisotropic
! SHAKE ====================================
ntc = 2 ! 1=no shake, 2=HX constrained, 3=all constrained
noshakemask = ":69|@272-282,290-291,1975-1988" ! do not shake these
ntf = 1 ! 1=cpt all bond E, 2=ignore HX bond E, 3=ignore all bond E
! MISC =====================================
cut = 10.0
ifqnt = 1
ig = -1
nmropt = 0
/
&ewald
dsum_tol = 1.e-6
/
&colvars
icolvars=1
input=""
output="imgXXX.cv.dumpave"
configfile="imgXXX.cv.disang"
/
&qmmm
qm_theory = 'DFTB3'
qmmask = ':69|@272-282,290-291,1975-1988'
qmcharge = 1
spin = 1
qmshake = 0
qm_ewald = 1
qmmm_switch = 1
scfconv = 1.e-10
verbosity = 0
tight_p_conv = 1
diag_routine = 0
pseudo_diag = 1
dftb_maxiter = 100
/
The 32 input files are copied from the template.
.. code-block::
[user@computer] cat MakeMdin.sh
#!/bin/bash
set -e
set -u
for i in {1..32}; do
img=$(printf "%03i" ${i})
sed -e "s/XXX/${img}/" template.mdin > img${img}.mdin
done
Prepare the colvars config files.
.. code-block::
[user@computer] python3 ${FETKHOME}/ndfes/examples/colvars/ndfes-ndpath2pathcv.py -i finalpath.metafile -d template.disang --zscale=0.25 --pad=0.1
[user@computer] ls
finalpath.metafile img007.cv.disang img014.cv.disang img021.cv.disang img028.cv.disang MakeMdin.sh
img001.cv.disang img008.cv.disang img015.cv.disang img022.cv.disang img029.cv.disang template.disang
img002.cv.disang img009.cv.disang img016.cv.disang img023.cv.disang img030.cv.disang template.mdin
img003.cv.disang img010.cv.disang img017.cv.disang img024.cv.disang img031.cv.disang
img004.cv.disang img011.cv.disang img018.cv.disang img025.cv.disang img032.cv.disang
img005.cv.disang img012.cv.disang img019.cv.disang img026.cv.disang img.origpath.dat
img006.cv.disang img013.cv.disang img020.cv.disang img027.cv.disang img.pathCV.dat
The "img.pathCV.dat" file is read by colvars to evaluate the path collective variables.
The "img.origpath.dat" are the images within the metafile expressed in terms of the path collective variables.
The origpath file is not used by sander nor colvars; you would normally run the simulations, evaluate a 2D free energy surface, and then interpolar the free energy values.
Run QM/MM PathCV umbrella sampling using colvars implementation within Sander
-----------------------------------------------------------------------------
Create the mdin files.
.. code-block::
[user@computer] bash ./MakeMdin.sh
finalpath.metafile img006.mdin img012.mdin img018.mdin img024.mdin img030.mdin
img001.cv.disang img007.cv.disang img013.cv.disang img019.cv.disang img025.cv.disang img031.cv.disang
img001.mdin img007.mdin img013.mdin img019.mdin img025.mdin img031.mdin
img002.cv.disang img008.cv.disang img014.cv.disang img020.cv.disang img026.cv.disang img032.cv.disang
img002.mdin img008.mdin img014.mdin img020.mdin img026.mdin img032.mdin
img003.cv.disang img009.cv.disang img015.cv.disang img021.cv.disang img027.cv.disang img.origpath.dat
img003.mdin img009.mdin img015.mdin img021.mdin img027.mdin img.pathCV.dat
img004.cv.disang img010.cv.disang img016.cv.disang img022.cv.disang img028.cv.disang MakeMdin.sh
img004.mdin img010.mdin img016.mdin img022.mdin img028.mdin template.disang
img005.cv.disang img011.cv.disang img017.cv.disang img023.cv.disang img029.cv.disang template.mdin
img005.mdin img011.mdin img017.mdin img023.mdin img029.mdin
img006.cv.disang img012.cv.disang img018.cv.disang img024.cv.disang img030.cv.disang
Submit the simulations.
.. code-block::
[user@computer] cat sub.slurm
#!/bin/bash
#SBATCH --job-name="sim"
#SBATCH --output="slurm/sim.%a.slurmout"
#SBATCH --error="slurm/sim.%a.slurmerr"
#SBATCH --partition=main
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --requeue
#SBATCH --export=ALL
#SBATCH -t 12:00:00
#SBATCH --array=1-32
#SBATCH --exclude=halk[0001-0106]
set -u
#
# Setup the environment to use sander and ndfes
#
module load 23jun25/ambertools/libcolvars
export OPENBLAS_NUM_THREADS=1
export OMP_NUM_THREADS=1
export GFORTRAN_UNBUFFERED_ALL=Y
#
# The parm7 file and the number of simulations per string
#
PARM=qmmmhmr.parm7
SIMIDX=${SLURM_ARRAY_TASK_ID}
function ifneeded() {
local mdout="$1"
local NEEDSIM=0
if [ ! -e "${mdout}" ]; then
NEEDSIM=1
else
if grep -q "Convergence could not be achieved" ${mdout}; then
NEEDSIM=1
elif grep -q " 5. TIMINGS" ${mdout}; then
continue
else
NEEDSIM=1
fi
fi
echo ${NEEDSIM}
}
needed=$(ifneeded ${base}.mdout)
if [ "${needed}" == "1" ]; then
mdin="${base}.mdin"
crd=init${base#img}.rst7
p="${PARM}"
echo ${PWD}
echo "sander.MPI -O -p ${p} -i ${mdin} -o ${base}.mdout -c ${crd} -r ${base}.rst7 -x ${base}.nc -inf ${base}.mdinfo"
err=0
for f in ${mdin} ${crd} ${p}; do
if [ ! -e "${f}" ]; then
echo "File not found ${f}"
err=1
fi
done
if [ "${err}" == "1" ]; then
exit 1
fi
mpirun -n 4 sander.MPI -O -p ${p} -i ${mdin} -o ${base}.mdout -c ${crd} -r ${base}.rst7 -x ${base}.nc -inf ${base}.mdinfo
fi
Analyze the PathCV umbrella sampling using ndfes
------------------------------------------------
To calculate thr 2D free energy surface, you should be able to use the ndfes-PrepareAmberData.py and ndfes programs.
The FE-Toolkit devel branch has a modified version of ndfes-PrepareAmberData.py that can read the colvars config files.
Use the ndfes-path script to calculate the free energy along the path within img.origpath.dat, or use the
Example2D.py plotting example with the --ipath=img.origpath.dat option.