Calculate a bookend correction from nonequilibrium sampling

Tutorial Timothy J. Giese¹, and Darrin M. York¹

¹Laboratory for Biomolecular Simulation Research, Institute

for Quantitative Biomedicine and Department of Chemistry and Chemical

Biology, Rutgers University, Piscataway, NJ 08854, USA

Learning objectives

Use sander to perform MM-to-QM/MM nonequilibrium simulations
Analyze the simulations to calculate a free energy correction

Relevant literature

See section “26.7. Nonequilibrium sampling and Jarzynski’s equation” in the Amber manual for details on performing nonequilibrium alchemical free energy simulations with sander. Amber manual
Jarzynski, C. (1997), “Nonequilibrium equality for free energy differences”, Phys. Rev. Lett., 78 (14): 2690, doi: 10.1103/PhysRevLett.78.2690
Jarzynski, C. (1997), “Equilibrium free-energy differences from nonequilibrium measurements: A master-equation approach”, Phys. Rev. E, 56 (5): 5018, doi: 10.1103/PhysRevE.56.5018

Tutorial

Introduction 

This tutorial is almost the same as Calculate a bookend correction from equilibrium sampling, which should be considered a mandatory prerequisite. In brief, we presume that we calculated an absolute binding free energy with a molecular mechanical force field, and we seek to correct the hydration free energy with a “bookend correction”. The correction is an application of the “indirect method” for calculating free energies from a reference potential.

\[\begin{equation}\label{E:bookend} \Delta G_{\text{QM/MM}} = \Delta G_{\text{MM}} + \Delta G_{\text{MM-to-QM/MM},\text{tgt}} - \Delta G_{\text{MM-to-QM/MM},\text{ref}} \end{equation}\]

where “tgt” and “ref” denote the target and reference environments.

The present tutorial calculates the correction by analyzing nonequilibrium sampling with Jarzynski’s equality. Nonequilibrium sampling in this context means that the alchemical coordinate is time-dependent λ(t). Specifically, λ uniformly scales from λmin-to-λmax (typically 0-to-1) during the course of a simulation – a so-called “switching simulation”.

A brief overview of the procedure is: * Perform a MM simulation and periodically save the coordinates and velocities to the trajectory file * For each frame, perform a short MM-to-QM/MM switching simulation * Each switching simulation produces a single “work value” * Calculate the free enery from the collection of work values using Jarzinski’s equality.

The functional form of Jarzinski’s equality is quite similar to exponential averaging encountered in equilibrium simulation.

\[\begin{equation} \Delta G_{\text{MM-to-QM/MM}} = - \frac{1}{\beta} \text{ln}\left\langle \text{exp}(-\beta W) \right\rangle_{\text{MM}} \end{equation}\]

The angle brackets represent an average over many switching simulations whose starting coordinates are drawn from the MM ensemble (the ensemble of the λ=0 state). Each switching simulation produces a single “work value”, W.

\[\begin{equation} W = \int_{t=0}^{\tau} \frac{\partial H(\mathbf{r}(t);\lambda(t))}{\partial\lambda} \frac{\partial\lambda}{\partial t} dt \end{equation}\]

In practice, the time integration is discretized τ = ∆t×nstlim, where ∆t is the time step and nstlim is the number of steps. If one assumes that the masses are independent of λ, then H(r(t);λ(t)) can be replaced with U(r(t);λ(t)), and the work can be expressed by the following summation. The time steps are indexed from 0 to nstlim-1, such that the elapsed time at step n is t=n∆t.

\[\begin{equation} W = \sum_{n=0}^{\text{nstlim}-1} \frac{\partial U(\mathbf{r}(n\Delta t);\lambda(n\Delta t))}{\partial\lambda} \left[ \lambda(n\Delta t+\Delta t) - \lambda(n\Delta t) \right] \end{equation}\]

The derivative is the thermodynamic gradient normally used to evaluate the free energy from thermodynamic integration; however, it only contributes to the nonequilibrum work when λ changes between consecutive time steps. In sander, the time dependence of λ is controlled with the dynlmb and ntave variables. Specifically, the value of λ is incremented by dynlmb once every ntave steps. If nstlim is an integer multiple of ntave, then λ will be incremented a total of Nincr = nstlim/ntave times, and each increment increases λ by dynlam = (λmax−λmin)/Nincr. One can rewrite the summation to include only the subset of samples contributing a nonzero value to the work (i.e., the samples whose time step is a multiple of ntave)

\[\begin{equation} W = \frac{\lambda_{\text{max}}-\lambda_{\text{min}}}{N_{\text{incr}}} \sum_{m=0}^{\text{nincr}} \frac{\partial U(\mathbf{r}(m\times\text{ntave}\times\Delta t);\lambda(m\times\text{ntave}\times\Delta t))}{\partial\lambda} \end{equation}\]

The MM (λ=0) simulation 

The switching simulations need an ensemble of MM starting configurations. You often need 500-or-more starting structures. When running the MM simulation, you should save both the coordinates and velocities to the trajectory file. You can do this by setting ntwv=-1 and ntwx=N, where N is the number of steps between writes to the trajectory file. It’s often better to run the MM simulation for longer and write to the trajectory file infrequently so your trajectory contains statistically independent samples and your trajectory adequately samples the MM landscape.

Each switching simulation is completely independent; therefore, if you had access to large number of processors, you could you cpptraj to write each frame of the trajectory file as a restart file. You could then use each restart file to initiate a switching simulation in a trivially-parallelized manner. On the other hand, this strategy produces a excessive number of files, which is inconvenient to manage.

The following example shows how to write a series of restart files from a trajectory file using cpptraj.

[user@computer] cpptraj
> parm mm.parm7
> trajin mm.nc
> trajout frame.rst7 restart keepext
> run
> quit

[user@computer] ls frame.*.rst7
frame.1.rst7  frame.2.rst7  frame.3.rst7  frame.4.rst7 [...]

Alternatively, sander allows you to read a trajectory file (set imin=6 and specify the trajectory file on the command line with the -y option). By setting imin=6, sander will run a simulation departing from each frame of the trajectory file, and all the outputs are effectively concatenated into a single file. To parallelize this strategy, use cpptraj to split the trajectory file into several smaller trajectory files.

[user@computer] cpptraj
> parm mm.parm7
> trajin mm.nc
> trajout frameblk1.nc netcdf onlyframes 1-100
> trajout frameblk2.nc netcdf onlyframes 101-200
> trajout frameblk3.nc netcdf onlyframes 201-300
> trajout frameblk4.nc netcdf onlyframes 301-400
> run
> quit

[user@computer] ls frame.*.nc
frameblk1.nc frameblk2.nc frameblk3.nc frameblk4.nc

Preparing the switching simulation files 

The bookend free energy simulations are simple, linear transformations U(λ)=U(0)+λ[U(1)-U(0)] performed with sander. The free energy infrastructure in sander is different from pmemd. In general, sander uses a groupfile to read the λ=0 and λ=1 states from 2 cooresponding sets of mdin, rst7, and parm7 files.

The groupfile should contain 2 lines. The first line is the MM (λ=0) state. The second is the QM/MM (λ=1) state. The groupfile will look something like this:

-O -p mm.parm7   -c init.rst7 -i mm.mdin -o mm.mdout -r mm.rst7 -x tmpmm.nc -inf mm.mdinfo -y mm.nc
-O -p qmmm.parm7 -c init.rst7 -i qm.mdin -o qm.mdout -r qm.rst7 -x tmpqm.nc -inf qm.mdinfo -y mm.nc

The mdin file should activate the free energy infrastructure (ifce=1). There are 2 mdin files for each value of clambda (the MM and QM/MM states). An example MM template mdin file is shown below. The QM/MM mdin file should be the same thing as the MM mdin file; however, one sets ifqnt=1 rather than ifqnt=0.

Switching simulation; assumes the input MM trajectory has velocities saved with ntwv=-1
&cntrl
! IO =======================================
      irest = 1       ! 0 = start, 1 = restart
        ntx = 5       ! 1 = start, 5 = restart
       ntxo = 1       ! read/write rst as formatted file
      iwrap = 1       ! wrap crds to unit cell
     ioutfm = 1       ! write mdcrd as netcdf
       ntpr = 1      ! mdout print freq
       ntwx = 0       ! mdcrd print freq
       ntwr = 500     ! rst print freq
       ntwv = 0
! DYNAMICS =================================
! - - ATTN: a 2 fs timestep only works if the parm7 was modified with hydrogen mass repartitioning - -
       imin = 6       ! Read input trajectory file and run dynamics
         dt = 0.001   ! ps/step
     nstlim = 500     ! number of time steps (per simulation)
        ntb = 1       ! 1=periodic box
! TEMPERATURE ==============================
      temp0 = 298     ! target temp
!   gamma_ln = 5.0     ! Langevin collision freq
!        ntt = 3       ! thermostat (3=Langevin)
! Use middle scheme rather than the default leapfrog integrator
! so the middle scheme people dont complain. Supposedly its more stable.
        ntt = 0
    ischeme = 1       ! middle scheme
ithermostat = 1       ! Langevin
therm_par   = 5.0     ! Langevin collision freq
! PRESSURE  ================================
! If we really wanted to run NPT, we'd need to use the Monte Carlo barostat because
! the QM/MM infrastructure doesn't contribute to the virial
        ntp = 0        !0=off 1=isotropic scaling
!       taup = 2.0     ! pressure relaxation time
!      pres0 = 1.013   ! pressure (bar), 1.013 bar/atm
!   barostat = 1       ! barostat (1=Berendsen, 2=MC)
! SHAKE ====================================
        ntc = 2       ! 1=no shake, 2=HX constrained, 3=all constrained
        ntf = 1       ! 1=cpt all bond E, 2=ignore HX bond E, 3=ignore all bond E
noshakemask=':1'
! MISC =====================================
        cut = 10
         ig = -1
      ifqnt = 0
! TI =======================================
       icfe = 1 ! interpret groupfile as being a FE simulation; first group is MM, second is QM/MM
    clambda = 0 ! Manually set lambda. 0=MM, 1=QM/MM
     dynlmb = 0
      ntave = 1 ! Change lambda every step
 ijarzynski = 1 ! Setup other input variables automatically and report the final work value
/

&qmmm
 qm_theory   = 'AM1D'
     qmmask  = ':1'
   qmcharge  = 0
       spin  = 1
    qmshake  = 0
   qm_ewald  = 1
qmmm_switch  = 1
  verbosity  = 0
/

Running the simulations 

You should equilibrate each lambda state and perform multiple trials of production. The command should be something like: OMP_NUM_THREADS=1 mpirun -n 2 sander.MPI -ng 2 -groupfile switch.groupfile. Replace mpirun with srun, if appropriate. Use more processors (a multiple of 2), if necessary. Most semiempirical methods do not scale well beyond 4 cores, so the most number of cores you’d likely ever use is 8.

Analyzing the results 

The mdout files generated with the provided template mdin file will contain a line reading “Final work value:” at the end of each switching simulation. If you set imin=6 and read a trajectory with the -y option, then there will be multiple occurances of “Final work value:”. You can simply grep these lines out of the mdout files for analysis. Alternatively, you can use the edgembar-bookend2dats.py script (a part of FE-ToolKit) to read your mdout files, and write the MBAR energies and DVDL values and write the necessary efep_*.dat data files. These data files are the format used by edgembar to calculate the free energy, as discussed in Analyzing ABFE Calculations (Tyk2)

The calculation of the free energies from Jarzynski’s equality can use the exact same infrastructure as computing the free energy from exponential averaging. You need generate 2 files per trial: efep_0.0_0.0.dat and efep_0.0_1.0.dat. The file format efep_A_B.dat means that the file contains the energy of state B for the samples drawn from state A. The efep_0.0_0.0.dat file contains 2 columns: the first column is a float or integer representing the time (or time step), and the second column should be 0.0. The efep_0.0_1.0.dat file also contains 2 columns, and it should have the same number of rows, but the energy values are the work values extracted from the mdout file.

An example using edgembar-bookend2dats.py follows. The example reads the MM mdout files (it could be the QM/MM mdout files – it doesn’t matter), and writes the data files to the LIG~LIGqm/environment/t01 directory.

edgembar-bookend2dats.py -o LIG~LIGqm/aq/t01 $(ls t01/aq/mm_*.mdout)
edgembar-bookend2dats.py -o LIG~LIGqm/gas/t01 $(ls t01/gas/mm_*.mdout)

One can then create edgembar XML inputfiles and run the edgembar program to produce the free energy value.

Calculate a bookend correction from nonequilibrium sampling

Learning objectives

Relevant literature

Tutorial

Introduction

The MM (λ=0) simulation

Preparing the switching simulation files

Running the simulations

Analyzing the results

Introduction 

The MM (λ=0) simulation 

Preparing the switching simulation files 

Running the simulations 

Analyzing the results 