QM/MM Umbrella sampling using path collective variables

Timothy J. Giese 1, and Darrin M. York1
1Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA

Learning objectives

  • Use colvars restraints within Sander

  • Convert Amber NMR Restraint DISANG file to colvars CONFIG format

  • Convert ndfes Metafile to a colvars CONFIG format that biases the the surface using Path Collective Variables

  • Run QM/MM PathCV umbrella sampling using colvars implementation within Sander

  • Analyze the PathCV umbrella sampling using ndfes

Relevant literature

Tutorial

In this Tutorial, you will learn how to activate and run restrained dynamics in Sander using the colvars library.

Introduction

The colvars library implements many novel collective variable definitions, including arithmetic path collective variables “Arithmetic” path collective variables express a sequence of primitive collective variables defining a pathway into 2 variables: s and v, where “s” is the progress along the path and “v” describes the direction perpendicular to the path. The colvars config file format defines this type of collective variable as aspathcv and azpathcv Path collective variables are useful to resample along a low-level minimum free energy path using a high-level Hamiltonian.

Use colvars restraints within Sander

The official version of AmberTools does not have the colvars library. It is only available within internally modified York Group versions of AmberTools. There are 2 versions available on Amarel.

[user@amarel] module use /projects/community/modulefiles
[user@amarel] module use /projectsp/f_lbsr/YorkGroup/software/modulefiles
[user@amarel] module use /projectsp/f_lbsr/YorkGroup/software/23jun25/modulefiles
#
# Now load the version of AmberTools with colvars
#
[user@amarel] module load 23jun25/ambertools/libcolvars
#
# It is also available in the HFDF version of AmberTools, but
# this should only be used with ab initio qm/mm simulations
#
[user@amarel] module load 23jun25/ambertools/hfdf

Whereas the standard NMR restraint functionality is specified in the mdin file by setting nmropt=1 in the &cntrl section, and setting the DISANG and DUMPAVE variables, the colvars functionality is specified in a separate namelist, shown below.

Title: Umbrella sampling example
&cntrl
   [...simulation options...]
   !
   ! Do not activate NMR restraints
   !
   nmropt = 0
/

&colvars
   icolvars=1
   output="img.cv.dumpave"
   configfile="img.cv.disang"
/
  • icolvars: integer, default=0; Activates colvar restraints if set to 1

  • configfile: string, default=””; File that defines the variables and restraints

  • output: string, default=””; The output file containing the observed collective variable values.

Convert Amber NMR Restraint DISANG file to colvars CONFIG format

One often applies harmonic and half harmonic restraints to simple, primitive collective variables, such as bond lengths, angles, dihedrals, and R12 coordinates. In this case, I have prepared a ndfes script to transform NMR restraint DISANG files into the colvars CONFIG file format. This script is not available within the main branch of FE-Toolkit; instead one needs to checkout and install the FE-Toolkit “devel” branch. The colvars-related scripts are not installed into the ${prefix}/bin/ directory; instead they are located within ndfes/examples/colvars/.

python3 ${FETKHOME}/ndfes/examples/colvars/ndfes-disang2colvars.py --help
usage: ndfes-disang2colvars.py [-h] -i INP -o OUT [--printFreq PRINTFREQ] [--version]

Convert amber nmr disang file to colvars format

options:
   -h, --help            show this help message and exit
   -i INP, --inp INP     The input amber disang file
   -o OUT, --out OUT     The output colvars disang file
   --printFreq PRINTFREQ
                         Collective variable print frequency, default=1
   --version             show program's version number and exit

The –printFreq option will set the value of colvarsTrajFrequency within the output CONFIG file. This value replaces the DUMPFREQ option within the mdin file.

Convert ndfes Metafile to a colvars CONFIG format that biases the the surface using Path Collective Variables

The ndfes-ndpath2pathcv.py script will read a metafile and write a series of colvars CONFIG files.

python3 ${FETKHOME}/ndfes/examples/colvars/ndfes-ndpath2pathcv.py --help
usage: ndfes-ndpath2pathcv.py [-h] -d DISANG -i INP [-P PREFIX] [-s SFC] [-z ZFC] [--zscale ZSCALE] [--ninterp NINTERP] [--nimg NIMG] [--lam LAM] [--pad PAD] [--printFreq PRINTFREQ] [--version]

options:
   -h, --help            show this help message and exit
   -d DISANG, --disang DISANG
                         The input template disang file
   -i INP, --inp INP     The input path. Either path-xml, metafile, or a txt file
   -P PREFIX, --prefix PREFIX
                         The prefix for all output files. Default: img
   -s SFC, --sfc SFC     The force constant for the s-variable. If none, then estimated. Default: None
   -z ZFC, --zfc ZFC     The force constant for the z-variable. If none, then estimated. Default: None
   --zscale ZSCALE       If the force constants are estimated, then scale the z-coordinate force constant by zscale. Default: 1.
   --ninterp NINTERP     The number of points used to define the path. If none, then the input points are used. Default: None
   --nimg NIMG           The number of simulation images. If none, then the size is the same as the number of points in the input path. Default: None
   --lam LAM             The value of lambda. If none, then set to (nimg-1)^2. Default: None
   --pad PAD             Increase the path length on each end by linearly extending the endpoints. Instead of s=[0,1], the extended path goes from [pad,1-pad]. Default: 0.
   --printFreq PRINTFREQ
                         Collective variable print frequency, default=1
   --version             show program's version number and exit
  • The simulation will likely require a 1 fs timestep to avoid instabilities.

  • Although the –inp option can support multiple file format, I recommend using a metafile. The metafile should contain NIMG states that uniformly discretize a path.

  • If the “s” and “z” collective variables are not explicitly provided with the –sfc and –zfc options, then the script will estimate appropriate force constants based on the force constants provided within the metafile.

  • In terms of “best practices”, it is unclear what force constant (if any) should be used in the z-direction. The estimated force constant is likely only reasonable for the s-direction. One can use the –zscale option so the z-direction force constant is the scaled value of the s-direction. I have tried –zscale=0.25; however, even that may be too large.

  • The path collective variable definition depends on parameter called “lambda”. If the –lam value is not explictly provided, then its value is calculated as the inverse of mean square distance between consecutive images.8

  • The domain of the s-coordinate is [0,1]; therefore, the path’s endstates may not be well-sampled and you won’t necessarily observe a “minimum” at the endstates. The –pad option allows you to linearly extend the path.

Example:

[user@cluster] ls
finalpath.metafile  MakeMdin.sh  template.disang  template.mdin

The final path has 32 images

[user@cluster] cat finalpath.metafile
0 298.00 mbar/dftb3/dumpaves/t001/img001.dumpave   -0.895238    1.00000000000000e+02   -1.709650    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img002.dumpave   -0.751972    1.00000000000000e+02   -1.786090    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img003.dumpave   -0.586146    1.00000000000000e+02   -1.792105    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img004.dumpave   -0.420187    1.00000000000000e+02   -1.795297    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img005.dumpave   -0.254189    1.00000000000000e+02   -1.795170    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img006.dumpave   -0.088197    1.00000000000000e+02   -1.793741    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img007.dumpave    0.077782    1.00000000000000e+02   -1.791245    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img008.dumpave    0.243769    1.00000000000000e+02   -1.789650    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img009.dumpave    0.409763    1.00000000000000e+02   -1.790691    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img010.dumpave    0.575704    1.00000000000000e+02   -1.786797    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img011.dumpave    0.741337    1.00000000000000e+02   -1.775957    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img012.dumpave    0.896588    1.00000000000000e+02   -1.734241    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img013.dumpave    0.931105    1.00000000000000e+02   -1.575666    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img014.dumpave    0.933179    1.00000000000000e+02   -1.409691    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img015.dumpave    0.935337    1.00000000000000e+02   -1.243708    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img016.dumpave    0.940270    1.00000000000000e+02   -1.077785    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img017.dumpave    0.944688    1.00000000000000e+02   -0.911847    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img018.dumpave    0.946230    1.00000000000000e+02   -0.745857    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img019.dumpave    0.947346    1.00000000000000e+02   -0.579864    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img020.dumpave    0.948720    1.00000000000000e+02   -0.413872    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img021.dumpave    0.947236    1.00000000000000e+02   -0.247883    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img022.dumpave    0.945023    1.00000000000000e+02   -0.081900    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img023.dumpave    0.944293    1.00000000000000e+02    0.084097    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img024.dumpave    0.943048    1.00000000000000e+02    0.250090    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img025.dumpave    0.940888    1.00000000000000e+02    0.416074    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img026.dumpave    0.939383    1.00000000000000e+02    0.582064    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img027.dumpave    0.939631    1.00000000000000e+02    0.748062    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img028.dumpave    0.939564    1.00000000000000e+02    0.914060    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img029.dumpave    0.944007    1.00000000000000e+02    1.079977    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img030.dumpave    0.953367    1.00000000000000e+02    1.245709    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img031.dumpave    0.962581    1.00000000000000e+02    1.411439    1.00000000000000e+02
0 298.00 mbar/dftb3/dumpaves/t001/img032.dumpave    0.980713    1.00000000000000e+02    1.576434    1.00000000000000e+02

The script figures out the dimensionality from the metafile, and it then assumes the first NDIM reaction coordinates in the disang file are the used to define the path. The remaining restraints are added to the config file without modification.

[user@computer] cat template.disang
&rst
   iat = 291, 290, 291, 2193
   rstwt = 1.00, -1.00
   r2  =   RC1,  r3 =   RC1
   r1  = -999.0000000000000,  r4 =  999.0000000000000
   rk2 =  100.0000000000000, rk3 =  100.0000000000000
&end

&rst
   iat = 2200, 2189, 2200, 1984
   rstwt = 1.00, -1.00
   r2  =   RC2,  r3 =   RC2
   r1  = -999.0000000000000,  r4 =  999.0000000000000
   rk2 =  100.0000000000000, rk3 =  100.0000000000000
&end

&rst
   iat = 1403, 2207
   r2  = -999.0000000000000,  r3 =    3.0000000000000
   r1  = -999.0000000000000,  r4 =  999.0000000000000
   rk2 =    0.0000000000000, rk3 =  100.0000000000000
&end

&rst
   iat = 1405, 2192
   r2  = -999.0000000000000,  r3 =    2.2000000000000
   r1  = -999.0000000000000,  r4 =  999.0000000000000
   rk2 =    0.0000000000000, rk3 =  100.0000000000000
&end

&rst
   iat = 1407, 2206
   r2  = -999.0000000000000,  r3 =    2.0000000000000
   r1  = -999.0000000000000,  r4 =  999.0000000000000
   rk2 =    0.0000000000000, rk3 =  100.0000000000000
&end

&rst
   iat = 2205, 282
   r2  = -999.0000000000000,  r3 =    2.2000000000000
   r1  = -999.0000000000000,  r4 =  999.0000000000000
   rk2 =    0.0000000000000, rk3 =  100.0000000000000
&end

&rst
   iat = 280, 2189
   r2  = -999.0000000000000,  r3 =    2.2000000000000
   r1  = -999.0000000000000,  r4 =  999.0000000000000
   rk2 =    0.0000000000000, rk3 =  100.0000000000000
&end

&rst
   iat = 2190, 1982
   r2  = -999.0000000000000,  r3 =    3.0000000000000
   r1  = -999.0000000000000,  r4 =  999.0000000000000
   rk2 =    0.0000000000000, rk3 =  100.0000000000000
&end

The mdin file has a &colvars namelist and nmropt=0

 [user@cluster] cat template.mdin
 title
 &cntrl
 ! IO =======================================
    irest = 1       ! 0 = start, 1 = restart
      ntx = 5       ! 1 = start, 5 = restart
     ntxo = 1       ! read/write rst as formatted file
    iwrap = 1       ! wrap crds to unit cell
   ioutfm = 1       ! write mdcrd as netcdf
     imin = 0
    ntmin = 1
     ntpr = 10
     ntwr = 10000
     ntwx = 0
     ntwv = 0
     ntwf = 0
 ! DYNAMICS =================================
   nstlim = 25000
 !
 ! Use a 1 fs timestep, not 2 fs
 !
       dt = 0.001   ! ps/step
      ntb = 1       ! 1=NVT periodic, 2=NPT periodic, 0=no box
 ! TEMPERATURE ==============================
    temp0 = 298     ! target temp
 !
 ! The HFDF code has to use ntt=3 via the leapfrog integrator
 !
 !   gamma_ln = 5.0     ! Langevin collision freq
 !   ntt = 3       ! thermostat (3=Langevin)
 !
 ! The libcovars version of AmberTools can also use the
 ! "middle scheme" integrator
 !
 ntt = 0
 ischeme = 1
 ithermostat = 1
 therm_par = 5.0
 ! PRESSURE  ================================
      ntp = 0       ! 0=no scaling, 1=isotropic, 2=anisotropic
 ! SHAKE ====================================
      ntc = 2       ! 1=no shake, 2=HX constrained, 3=all constrained
      noshakemask = ":69|@272-282,290-291,1975-1988"    ! do not shake these
      ntf = 1       ! 1=cpt all bond E, 2=ignore HX bond E, 3=ignore all bond E
 ! MISC =====================================
      cut = 10.0
    ifqnt = 1
       ig = -1
   nmropt = 0
 /


 &ewald
    dsum_tol = 1.e-6
 /

 &colvars
    icolvars=1
    input=""
    output="imgXXX.cv.dumpave"
    configfile="imgXXX.cv.disang"
 /

 &qmmm
  qm_theory   = 'DFTB3'
      qmmask  = ':69|@272-282,290-291,1975-1988'
    qmcharge  = 1
        spin  = 1
     qmshake  = 0
    qm_ewald  = 1
 qmmm_switch  = 1
     scfconv  = 1.e-10
   verbosity  = 0
tight_p_conv  = 1
diag_routine  = 0
 pseudo_diag  = 1
dftb_maxiter  = 100
 /

The 32 input files are copied from the template.

[user@computer] cat MakeMdin.sh
#!/bin/bash
set -e
set -u

for i in {1..32}; do
   img=$(printf "%03i" ${i})
   sed -e "s/XXX/${img}/" template.mdin > img${img}.mdin
done

Prepare the colvars config files.

[user@computer] python3 ${FETKHOME}/ndfes/examples/colvars/ndfes-ndpath2pathcv.py -i finalpath.metafile -d template.disang --zscale=0.25 --pad=0.1
[user@computer] ls
finalpath.metafile  img007.cv.disang  img014.cv.disang  img021.cv.disang  img028.cv.disang  MakeMdin.sh
img001.cv.disang    img008.cv.disang  img015.cv.disang  img022.cv.disang  img029.cv.disang  template.disang
img002.cv.disang    img009.cv.disang  img016.cv.disang  img023.cv.disang  img030.cv.disang  template.mdin
img003.cv.disang    img010.cv.disang  img017.cv.disang  img024.cv.disang  img031.cv.disang
img004.cv.disang    img011.cv.disang  img018.cv.disang  img025.cv.disang  img032.cv.disang
img005.cv.disang    img012.cv.disang  img019.cv.disang  img026.cv.disang  img.origpath.dat
img006.cv.disang    img013.cv.disang  img020.cv.disang  img027.cv.disang  img.pathCV.dat

The “img.pathCV.dat” file is read by colvars to evaluate the path collective variables. The “img.origpath.dat” are the images within the metafile expressed in terms of the path collective variables. The origpath file is not used by sander nor colvars; you would normally run the simulations, evaluate a 2D free energy surface, and then interpolar the free energy values.

Run QM/MM PathCV umbrella sampling using colvars implementation within Sander

Create the mdin files.

[user@computer] bash ./MakeMdin.sh
finalpath.metafile  img006.mdin       img012.mdin       img018.mdin       img024.mdin       img030.mdin
img001.cv.disang    img007.cv.disang  img013.cv.disang  img019.cv.disang  img025.cv.disang  img031.cv.disang
img001.mdin         img007.mdin       img013.mdin       img019.mdin       img025.mdin       img031.mdin
img002.cv.disang    img008.cv.disang  img014.cv.disang  img020.cv.disang  img026.cv.disang  img032.cv.disang
img002.mdin         img008.mdin       img014.mdin       img020.mdin       img026.mdin       img032.mdin
img003.cv.disang    img009.cv.disang  img015.cv.disang  img021.cv.disang  img027.cv.disang  img.origpath.dat
img003.mdin         img009.mdin       img015.mdin       img021.mdin       img027.mdin       img.pathCV.dat
img004.cv.disang    img010.cv.disang  img016.cv.disang  img022.cv.disang  img028.cv.disang  MakeMdin.sh
img004.mdin         img010.mdin       img016.mdin       img022.mdin       img028.mdin       template.disang
img005.cv.disang    img011.cv.disang  img017.cv.disang  img023.cv.disang  img029.cv.disang  template.mdin
img005.mdin         img011.mdin       img017.mdin       img023.mdin       img029.mdin
img006.cv.disang    img012.cv.disang  img018.cv.disang  img024.cv.disang  img030.cv.disang

Submit the simulations.

[user@computer] cat sub.slurm
#!/bin/bash
#SBATCH --job-name="sim"
#SBATCH --output="slurm/sim.%a.slurmout"
#SBATCH --error="slurm/sim.%a.slurmerr"
#SBATCH --partition=main
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --requeue
#SBATCH --export=ALL
#SBATCH -t 12:00:00
#SBATCH --array=1-32
#SBATCH --exclude=halk[0001-0106]

set -u

#
# Setup the environment to use sander and ndfes
#

module load 23jun25/ambertools/libcolvars

export OPENBLAS_NUM_THREADS=1
export OMP_NUM_THREADS=1
export GFORTRAN_UNBUFFERED_ALL=Y

#
# The parm7 file and the number of simulations per string
#

PARM=qmmmhmr.parm7
SIMIDX=${SLURM_ARRAY_TASK_ID}

function ifneeded() {
   local mdout="$1"
   local NEEDSIM=0
   if [ ! -e "${mdout}" ]; then
      NEEDSIM=1
   else
      if grep -q "Convergence could not be achieved" ${mdout}; then
         NEEDSIM=1
      elif grep -q "   5.  TIMINGS" ${mdout}; then
         continue
      else
         NEEDSIM=1
      fi
   fi
   echo ${NEEDSIM}
}

needed=$(ifneeded ${base}.mdout)
if [ "${needed}" == "1" ]; then
   mdin="${base}.mdin"
   crd=init${base#img}.rst7
   p="${PARM}"
   echo ${PWD}
   echo "sander.MPI -O -p ${p} -i ${mdin} -o ${base}.mdout -c ${crd} -r ${base}.rst7 -x ${base}.nc -inf ${base}.mdinfo"
   err=0
   for f in ${mdin} ${crd} ${p}; do
      if [ ! -e "${f}" ]; then
         echo "File not found ${f}"
         err=1
      fi
   done
   if [ "${err}" == "1" ]; then
      exit 1
   fi
   mpirun -n 4 sander.MPI -O -p ${p} -i ${mdin} -o ${base}.mdout -c ${crd} -r ${base}.rst7 -x ${base}.nc -inf ${base}.mdinfo
fi

Analyze the PathCV umbrella sampling using ndfes

To calculate thr 2D free energy surface, you should be able to use the ndfes-PrepareAmberData.py and ndfes programs. The FE-Toolkit devel branch has a modified version of ndfes-PrepareAmberData.py that can read the colvars config files. Use the ndfes-path script to calculate the free energy along the path within img.origpath.dat, or use the Example2D.py plotting example with the –ipath=img.origpath.dat option.