QM/MM Umbrella sampling using path collective variables ================================================================================ | Timothy J. Giese :sup:`1`, and Darrin M. York\ :sup:`1` | :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA Learning objectives ------------------- - Use colvars restraints within Sander - Convert Amber NMR Restraint DISANG file to colvars CONFIG format - Convert ndfes Metafile to a colvars CONFIG format that biases the the surface using Path Collective Variables - Run QM/MM PathCV umbrella sampling using colvars implementation within Sander - Analyze the PathCV umbrella sampling using ndfes Relevant literature ------------------- - `Collective variables library for molecular simulation and analysis programs `__ - `From a to b in free energy space `__ - `Defining an optimal metric for the path collective variables `__ Tutorial -------- In this Tutorial, you will learn how to activate and run restrained dynamics in Sander using the colvars library. .. contents:: :local: :depth: 4 .. start-tutorial Introduction ------------ The colvars library implements many novel collective variable definitions, including `arithmetic path collective variables `__ "Arithmetic" path collective variables express a sequence of primitive collective variables defining a pathway into 2 variables: s and v, where "s" is the progress along the path and "v" describes the direction perpendicular to the path. The colvars config file format defines this type of collective variable as `aspathcv `__ and `azpathcv `__ Path collective variables are useful to resample along a low-level minimum free energy path using a high-level Hamiltonian. Use colvars restraints within Sander ------------------------------------ The official version of AmberTools does not have the colvars library. It is only available within internally modified York Group versions of AmberTools. There are 2 versions available on Amarel. .. code-block:: bash [user@amarel] module use /projects/community/modulefiles [user@amarel] module use /projectsp/f_lbsr/YorkGroup/software/modulefiles [user@amarel] module use /projectsp/f_lbsr/YorkGroup/software/23jun25/modulefiles # # Now load the version of AmberTools with colvars # [user@amarel] module load 23jun25/ambertools/libcolvars # # It is also available in the HFDF version of AmberTools, but # this should only be used with ab initio qm/mm simulations # [user@amarel] module load 23jun25/ambertools/hfdf Whereas the standard NMR restraint functionality is specified in the mdin file by setting nmropt=1 in the &cntrl section, and setting the DISANG and DUMPAVE variables, the colvars functionality is specified in a separate namelist, shown below. .. code-block:: Title: Umbrella sampling example &cntrl [...simulation options...] ! ! Do not activate NMR restraints ! nmropt = 0 / &colvars icolvars=1 output="img.cv.dumpave" configfile="img.cv.disang" / - icolvars: integer, default=0; Activates colvar restraints if set to 1 - configfile: string, default=""; File that defines the variables and restraints - output: string, default=""; The output file containing the observed collective variable values. Convert Amber NMR Restraint DISANG file to colvars CONFIG format ---------------------------------------------------------------- One often applies harmonic and half harmonic restraints to simple, primitive collective variables, such as bond lengths, angles, dihedrals, and R12 coordinates. In this case, I have prepared a ndfes script to transform NMR restraint DISANG files into the colvars CONFIG file format. This script is not available within the main branch of FE-Toolkit; instead one needs to checkout and install the FE-Toolkit "devel" branch. The colvars-related scripts are not installed into the ${prefix}/bin/ directory; instead they are located within ndfes/examples/colvars/. .. code-block:: python3 ${FETKHOME}/ndfes/examples/colvars/ndfes-disang2colvars.py --help usage: ndfes-disang2colvars.py [-h] -i INP -o OUT [--printFreq PRINTFREQ] [--version] Convert amber nmr disang file to colvars format options: -h, --help show this help message and exit -i INP, --inp INP The input amber disang file -o OUT, --out OUT The output colvars disang file --printFreq PRINTFREQ Collective variable print frequency, default=1 --version show program's version number and exit The --printFreq option will set the value of colvarsTrajFrequency within the output CONFIG file. This value replaces the DUMPFREQ option within the mdin file. Convert ndfes Metafile to a colvars CONFIG format that biases the the surface using Path Collective Variables ------------------------------------------------------------------------------------------------------------- The ndfes-ndpath2pathcv.py script will read a metafile and write a series of colvars CONFIG files. .. code-block:: python3 ${FETKHOME}/ndfes/examples/colvars/ndfes-ndpath2pathcv.py --help usage: ndfes-ndpath2pathcv.py [-h] -d DISANG -i INP [-P PREFIX] [-s SFC] [-z ZFC] [--zscale ZSCALE] [--ninterp NINTERP] [--nimg NIMG] [--lam LAM] [--pad PAD] [--printFreq PRINTFREQ] [--version] options: -h, --help show this help message and exit -d DISANG, --disang DISANG The input template disang file -i INP, --inp INP The input path. Either path-xml, metafile, or a txt file -P PREFIX, --prefix PREFIX The prefix for all output files. Default: img -s SFC, --sfc SFC The force constant for the s-variable. If none, then estimated. Default: None -z ZFC, --zfc ZFC The force constant for the z-variable. If none, then estimated. Default: None --zscale ZSCALE If the force constants are estimated, then scale the z-coordinate force constant by zscale. Default: 1. --ninterp NINTERP The number of points used to define the path. If none, then the input points are used. Default: None --nimg NIMG The number of simulation images. If none, then the size is the same as the number of points in the input path. Default: None --lam LAM The value of lambda. If none, then set to (nimg-1)^2. Default: None --pad PAD Increase the path length on each end by linearly extending the endpoints. Instead of s=[0,1], the extended path goes from [pad,1-pad]. Default: 0. --printFreq PRINTFREQ Collective variable print frequency, default=1 --version show program's version number and exit - The simulation will likely require a 1 fs timestep to avoid instabilities. - Although the --inp option can support multiple file format, I recommend using a metafile. The metafile should contain NIMG states that uniformly discretize a path. - If the "s" and "z" collective variables are not explicitly provided with the --sfc and --zfc options, then the script will estimate appropriate force constants based on the force constants provided within the metafile. - In terms of "best practices", it is unclear what force constant (if any) should be used in the z-direction. The estimated force constant is likely only reasonable for the s-direction. One can use the --zscale option so the z-direction force constant is the scaled value of the s-direction. I have tried --zscale=0.25; however, even that may be too large. - The path collective variable definition depends on parameter called "lambda". If the --lam value is not explictly provided, then its value is calculated as the inverse of mean square distance between consecutive images.8 - The domain of the s-coordinate is [0,1]; therefore, the path's endstates may not be well-sampled and you won't necessarily observe a "minimum" at the endstates. The --pad option allows you to linearly extend the path. Example: .. code-block:: [user@cluster] ls finalpath.metafile MakeMdin.sh template.disang template.mdin The final path has 32 images .. code-block:: [user@cluster] cat finalpath.metafile 0 298.00 mbar/dftb3/dumpaves/t001/img001.dumpave -0.895238 1.00000000000000e+02 -1.709650 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img002.dumpave -0.751972 1.00000000000000e+02 -1.786090 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img003.dumpave -0.586146 1.00000000000000e+02 -1.792105 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img004.dumpave -0.420187 1.00000000000000e+02 -1.795297 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img005.dumpave -0.254189 1.00000000000000e+02 -1.795170 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img006.dumpave -0.088197 1.00000000000000e+02 -1.793741 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img007.dumpave 0.077782 1.00000000000000e+02 -1.791245 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img008.dumpave 0.243769 1.00000000000000e+02 -1.789650 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img009.dumpave 0.409763 1.00000000000000e+02 -1.790691 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img010.dumpave 0.575704 1.00000000000000e+02 -1.786797 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img011.dumpave 0.741337 1.00000000000000e+02 -1.775957 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img012.dumpave 0.896588 1.00000000000000e+02 -1.734241 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img013.dumpave 0.931105 1.00000000000000e+02 -1.575666 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img014.dumpave 0.933179 1.00000000000000e+02 -1.409691 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img015.dumpave 0.935337 1.00000000000000e+02 -1.243708 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img016.dumpave 0.940270 1.00000000000000e+02 -1.077785 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img017.dumpave 0.944688 1.00000000000000e+02 -0.911847 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img018.dumpave 0.946230 1.00000000000000e+02 -0.745857 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img019.dumpave 0.947346 1.00000000000000e+02 -0.579864 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img020.dumpave 0.948720 1.00000000000000e+02 -0.413872 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img021.dumpave 0.947236 1.00000000000000e+02 -0.247883 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img022.dumpave 0.945023 1.00000000000000e+02 -0.081900 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img023.dumpave 0.944293 1.00000000000000e+02 0.084097 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img024.dumpave 0.943048 1.00000000000000e+02 0.250090 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img025.dumpave 0.940888 1.00000000000000e+02 0.416074 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img026.dumpave 0.939383 1.00000000000000e+02 0.582064 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img027.dumpave 0.939631 1.00000000000000e+02 0.748062 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img028.dumpave 0.939564 1.00000000000000e+02 0.914060 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img029.dumpave 0.944007 1.00000000000000e+02 1.079977 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img030.dumpave 0.953367 1.00000000000000e+02 1.245709 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img031.dumpave 0.962581 1.00000000000000e+02 1.411439 1.00000000000000e+02 0 298.00 mbar/dftb3/dumpaves/t001/img032.dumpave 0.980713 1.00000000000000e+02 1.576434 1.00000000000000e+02 The script figures out the dimensionality from the metafile, and it then assumes the first NDIM reaction coordinates in the disang file are the used to define the path. The remaining restraints are added to the config file without modification. .. code-block:: [user@computer] cat template.disang &rst iat = 291, 290, 291, 2193 rstwt = 1.00, -1.00 r2 = RC1, r3 = RC1 r1 = -999.0000000000000, r4 = 999.0000000000000 rk2 = 100.0000000000000, rk3 = 100.0000000000000 &end &rst iat = 2200, 2189, 2200, 1984 rstwt = 1.00, -1.00 r2 = RC2, r3 = RC2 r1 = -999.0000000000000, r4 = 999.0000000000000 rk2 = 100.0000000000000, rk3 = 100.0000000000000 &end &rst iat = 1403, 2207 r2 = -999.0000000000000, r3 = 3.0000000000000 r1 = -999.0000000000000, r4 = 999.0000000000000 rk2 = 0.0000000000000, rk3 = 100.0000000000000 &end &rst iat = 1405, 2192 r2 = -999.0000000000000, r3 = 2.2000000000000 r1 = -999.0000000000000, r4 = 999.0000000000000 rk2 = 0.0000000000000, rk3 = 100.0000000000000 &end &rst iat = 1407, 2206 r2 = -999.0000000000000, r3 = 2.0000000000000 r1 = -999.0000000000000, r4 = 999.0000000000000 rk2 = 0.0000000000000, rk3 = 100.0000000000000 &end &rst iat = 2205, 282 r2 = -999.0000000000000, r3 = 2.2000000000000 r1 = -999.0000000000000, r4 = 999.0000000000000 rk2 = 0.0000000000000, rk3 = 100.0000000000000 &end &rst iat = 280, 2189 r2 = -999.0000000000000, r3 = 2.2000000000000 r1 = -999.0000000000000, r4 = 999.0000000000000 rk2 = 0.0000000000000, rk3 = 100.0000000000000 &end &rst iat = 2190, 1982 r2 = -999.0000000000000, r3 = 3.0000000000000 r1 = -999.0000000000000, r4 = 999.0000000000000 rk2 = 0.0000000000000, rk3 = 100.0000000000000 &end The mdin file has a &colvars namelist and nmropt=0 .. code-block:: [user@cluster] cat template.mdin title &cntrl ! IO ======================================= irest = 1 ! 0 = start, 1 = restart ntx = 5 ! 1 = start, 5 = restart ntxo = 1 ! read/write rst as formatted file iwrap = 1 ! wrap crds to unit cell ioutfm = 1 ! write mdcrd as netcdf imin = 0 ntmin = 1 ntpr = 10 ntwr = 10000 ntwx = 0 ntwv = 0 ntwf = 0 ! DYNAMICS ================================= nstlim = 25000 ! ! Use a 1 fs timestep, not 2 fs ! dt = 0.001 ! ps/step ntb = 1 ! 1=NVT periodic, 2=NPT periodic, 0=no box ! TEMPERATURE ============================== temp0 = 298 ! target temp ! ! The HFDF code has to use ntt=3 via the leapfrog integrator ! ! gamma_ln = 5.0 ! Langevin collision freq ! ntt = 3 ! thermostat (3=Langevin) ! ! The libcovars version of AmberTools can also use the ! "middle scheme" integrator ! ntt = 0 ischeme = 1 ithermostat = 1 therm_par = 5.0 ! PRESSURE ================================ ntp = 0 ! 0=no scaling, 1=isotropic, 2=anisotropic ! SHAKE ==================================== ntc = 2 ! 1=no shake, 2=HX constrained, 3=all constrained noshakemask = ":69|@272-282,290-291,1975-1988" ! do not shake these ntf = 1 ! 1=cpt all bond E, 2=ignore HX bond E, 3=ignore all bond E ! MISC ===================================== cut = 10.0 ifqnt = 1 ig = -1 nmropt = 0 / &ewald dsum_tol = 1.e-6 / &colvars icolvars=1 input="" output="imgXXX.cv.dumpave" configfile="imgXXX.cv.disang" / &qmmm qm_theory = 'DFTB3' qmmask = ':69|@272-282,290-291,1975-1988' qmcharge = 1 spin = 1 qmshake = 0 qm_ewald = 1 qmmm_switch = 1 scfconv = 1.e-10 verbosity = 0 tight_p_conv = 1 diag_routine = 0 pseudo_diag = 1 dftb_maxiter = 100 / The 32 input files are copied from the template. .. code-block:: [user@computer] cat MakeMdin.sh #!/bin/bash set -e set -u for i in {1..32}; do img=$(printf "%03i" ${i}) sed -e "s/XXX/${img}/" template.mdin > img${img}.mdin done Prepare the colvars config files. .. code-block:: [user@computer] python3 ${FETKHOME}/ndfes/examples/colvars/ndfes-ndpath2pathcv.py -i finalpath.metafile -d template.disang --zscale=0.25 --pad=0.1 [user@computer] ls finalpath.metafile img007.cv.disang img014.cv.disang img021.cv.disang img028.cv.disang MakeMdin.sh img001.cv.disang img008.cv.disang img015.cv.disang img022.cv.disang img029.cv.disang template.disang img002.cv.disang img009.cv.disang img016.cv.disang img023.cv.disang img030.cv.disang template.mdin img003.cv.disang img010.cv.disang img017.cv.disang img024.cv.disang img031.cv.disang img004.cv.disang img011.cv.disang img018.cv.disang img025.cv.disang img032.cv.disang img005.cv.disang img012.cv.disang img019.cv.disang img026.cv.disang img.origpath.dat img006.cv.disang img013.cv.disang img020.cv.disang img027.cv.disang img.pathCV.dat The "img.pathCV.dat" file is read by colvars to evaluate the path collective variables. The "img.origpath.dat" are the images within the metafile expressed in terms of the path collective variables. The origpath file is not used by sander nor colvars; you would normally run the simulations, evaluate a 2D free energy surface, and then interpolar the free energy values. Run QM/MM PathCV umbrella sampling using colvars implementation within Sander ----------------------------------------------------------------------------- Create the mdin files. .. code-block:: [user@computer] bash ./MakeMdin.sh finalpath.metafile img006.mdin img012.mdin img018.mdin img024.mdin img030.mdin img001.cv.disang img007.cv.disang img013.cv.disang img019.cv.disang img025.cv.disang img031.cv.disang img001.mdin img007.mdin img013.mdin img019.mdin img025.mdin img031.mdin img002.cv.disang img008.cv.disang img014.cv.disang img020.cv.disang img026.cv.disang img032.cv.disang img002.mdin img008.mdin img014.mdin img020.mdin img026.mdin img032.mdin img003.cv.disang img009.cv.disang img015.cv.disang img021.cv.disang img027.cv.disang img.origpath.dat img003.mdin img009.mdin img015.mdin img021.mdin img027.mdin img.pathCV.dat img004.cv.disang img010.cv.disang img016.cv.disang img022.cv.disang img028.cv.disang MakeMdin.sh img004.mdin img010.mdin img016.mdin img022.mdin img028.mdin template.disang img005.cv.disang img011.cv.disang img017.cv.disang img023.cv.disang img029.cv.disang template.mdin img005.mdin img011.mdin img017.mdin img023.mdin img029.mdin img006.cv.disang img012.cv.disang img018.cv.disang img024.cv.disang img030.cv.disang Submit the simulations. .. code-block:: [user@computer] cat sub.slurm #!/bin/bash #SBATCH --job-name="sim" #SBATCH --output="slurm/sim.%a.slurmout" #SBATCH --error="slurm/sim.%a.slurmerr" #SBATCH --partition=main #SBATCH --ntasks=4 #SBATCH --cpus-per-task=1 #SBATCH --mem=4G #SBATCH --requeue #SBATCH --export=ALL #SBATCH -t 12:00:00 #SBATCH --array=1-32 #SBATCH --exclude=halk[0001-0106] set -u # # Setup the environment to use sander and ndfes # module load 23jun25/ambertools/libcolvars export OPENBLAS_NUM_THREADS=1 export OMP_NUM_THREADS=1 export GFORTRAN_UNBUFFERED_ALL=Y # # The parm7 file and the number of simulations per string # PARM=qmmmhmr.parm7 SIMIDX=${SLURM_ARRAY_TASK_ID} function ifneeded() { local mdout="$1" local NEEDSIM=0 if [ ! -e "${mdout}" ]; then NEEDSIM=1 else if grep -q "Convergence could not be achieved" ${mdout}; then NEEDSIM=1 elif grep -q " 5. TIMINGS" ${mdout}; then continue else NEEDSIM=1 fi fi echo ${NEEDSIM} } needed=$(ifneeded ${base}.mdout) if [ "${needed}" == "1" ]; then mdin="${base}.mdin" crd=init${base#img}.rst7 p="${PARM}" echo ${PWD} echo "sander.MPI -O -p ${p} -i ${mdin} -o ${base}.mdout -c ${crd} -r ${base}.rst7 -x ${base}.nc -inf ${base}.mdinfo" err=0 for f in ${mdin} ${crd} ${p}; do if [ ! -e "${f}" ]; then echo "File not found ${f}" err=1 fi done if [ "${err}" == "1" ]; then exit 1 fi mpirun -n 4 sander.MPI -O -p ${p} -i ${mdin} -o ${base}.mdout -c ${crd} -r ${base}.rst7 -x ${base}.nc -inf ${base}.mdinfo fi Analyze the PathCV umbrella sampling using ndfes ------------------------------------------------ To calculate thr 2D free energy surface, you should be able to use the ndfes-PrepareAmberData.py and ndfes programs. The FE-Toolkit devel branch has a modified version of ndfes-PrepareAmberData.py that can read the colvars config files. Use the ndfes-path script to calculate the free energy along the path within img.origpath.dat, or use the Example2D.py plotting example with the --ipath=img.origpath.dat option.