.. _eqbookend:


Calculate a bookend correction from equilibrium sampling
========================================================


| Tutorial Timothy J. Giese\ :sup:`1`,  and Darrin M. York\ :sup:`1`
| :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute
|   for Quantitative Biomedicine and Department of Chemistry and Chemical
|   Biology, Rutgers University, Piscataway, NJ 08854, USA


Learning objectives
-------------------

.. start-learning-objectives

- Use sander to perform MM-to-QM/MM equilibrium simulations 
- Analyze the simulations to calculate a free energy correction

  
Relevant literature
-------------------

- See section "26.1.1. Thermodynamic integration using Sander" in the
  Amber manual for details on performing equilibrium alchemical free
  energy simulations with sander.
  `Amber manual <https://ambermd.org/Manuals.php>`__
- Giese, T.J.; York, D. M. "Development of a Robust Indirect Approach for MM → QM Free Energy Calculations That Combines Force-Matched Reference Potential and Bennett's Acceptance Ratio Methods" J. Chem. Theory Comput. 15(10), 5543-5562 (2019).
  
Tutorial
--------

.. contents::
   :local:
   :depth: 4


Introduction
~~~~~~~~~~~~~~~~~~~~~~~
Suppose you want to calculate a change in free energy via an
alchemical pathway  using an ab initio or semiempirical method
rather than a molecular mechanical force field.

Unlike a molecular mechanical force field, one can't directly
sample quantum mechanical methods in an alchemical pathway
(which may require fractional nuclear charges and fractional
numbers of electrons as atoms appear or disappear).

To overcome this limitation, one uses an "indirect" reference
potential approach, whereby the free energy is calculated with
the low-level force field that is fast enough to sample the
transformation for an extended period of time with multiple trials,
and the high-level QM contribution to the free energy is calculated
as a correction to the physical end-states.
A depiction of the bookend corrections is shown in the following figure.


.. figure:: Cycles.png
   :scale: 100 %
   :alt: Absolute solvation (hydration) free energies calculated with MM and QM/MM potentials

   The left-side of the figure shows a thermodynamic cycle for calculating an absolute solvation free energy using a MM force field. The equation below it is the mathematical representation of the cycle.  The "cycle" performs real-to-dummy state transformations in two environments: aqueous phase and gas phase. The direction of the arrows are chosen so the lambda=0 and lambda=1 states are the real and dummy states, respectively.
   The right-side shows how to calculate the hydration free energy using a QM/MM method.  The complicated alchemical sampling is performed with the MM force field and the MM-to-QM/MM transformations correct the real-state free energies.  These "bookend" corrections can be run with far less sampling because there are no chemical or alchemical changes.


The bookend transformations transform the potential energy from
a MM force field to a QM/MM potential. A bookend transformation is required for each "real state".  Mathematically, it is given by the following equation

.. math::
   \begin{equation}\label{E:bookend}
   \Delta G_{\text{QM/MM}} = \Delta G_{\text{MM}} + \Delta G_{\text{MM-to-QM/MM},\text{tgt}} - \Delta G_{\text{MM-to-QM/MM},\text{ref}}
   \end{equation}

   
where "tgt" and "ref" denote the target and reference environments.


Preparing the input files
~~~~~~~~~~~~~~~~~~~~~~~~~
The bookend free energy simulations are simple, linear transformations U(λ)=U(0)+λ[U(1)-U(0)] performed with sander.
The free energy infrastructure in sander is different from pmemd.
In general, sander uses a groupfile to read the  λ=0 and λ=1 states from 2 cooresponding sets of mdin, rst7, and parm7 files.


The groupfile should contain 2 lines. The first line is the MM (λ=0) state. The second is the QM/MM (λ=1) state.
A separate groupfile is used for each value of clambda.  It is often sufficient to use 5 lambda values: 0.00, 0.20, 0.40, 0.60, 0.80, 1.00.
Notice that both states read the exact same input coordinates.

.. code-block::

   -O -p mm.parm7   -c init.rst7 -i mm_0.00.mdin -o mm_0.00.mdout -r mm_0.00.rst7 -x mm_0.00.nc -inf mm_0.00.mdinfo
   -O -p qmmm.parm7 -c init.rst7 -i qm_0.00.mdin -o qm_0.00.mdout -r qm_0.00.rst7 -x qm_0.00.nc -inf qm_0.00.mdinfo


The mdin file should activate the free energy infrastructure (ifce=1).
There are 2 mdin files for each value of clambda (the MM and QM/MM states).
An example MM template mdin file is shown below.
The QM/MM mdin file should be the same thing as the MM mdin file;
however, one sets ifqnt=1 rather than ifqnt=0.
Furthermore, the MM and QM/MM simulations both don't need to wite a trajectory file
because they both encounter the same sets of coordinates, so one should
ntwx=0 in the QM/MM mdin file while setting ntwx to the desired print frequency
in the MM mdin file.

.. code-block::

   Equilibrium TI analysis
   &cntrl
   ! IO =======================================
         irest = 1       ! 0 = start, 1 = restart
           ntx = 5       ! 1 = start, 5 = restart
          ntxo = 1       ! read/write rst as formatted file
         iwrap = 1       ! wrap crds to unit cell
        ioutfm = 1       ! write mdcrd as netcdf
          ntpr = 50      ! mdout print freq
          ntwx = 0       ! mdcrd print freq
          ntwr = 500     ! rst print freq
          ntwv = 0
   ! DYNAMICS =================================
   ! - - ATTN: a 2 fs timestep only works if the parm7 was modified with hydrogen mass repartitioning - -
          imin = 0       ! Run dynamics
            dt = 0.001   ! ps/step
        nstlim = 10000   ! number of time steps (per simulation)
           ntb = 1       ! 1=periodic box
   ! TEMPERATURE ==============================
         temp0 = 298     ! target temp
   !   gamma_ln = 5.0     ! Langevin collision freq
   !        ntt = 3       ! thermostat (3=Langevin)
   ! Use middle scheme rather than the default leapfrog integrator
   ! so the middle scheme people dont complain. Supposedly its more stable.
           ntt = 0
       ischeme = 1       ! middle scheme
   ithermostat = 1       ! Langevin
   therm_par   = 5.0     ! Langevin collision freq
   ! PRESSURE  ================================
   ! If we really wanted to run NPT, we'd need to use the Monte Carlo barostat because
   ! the QM/MM infrastructure doesn't contribute to the virial
           ntp = 0        !0=off 1=isotropic scaling
   !       taup = 2.0     ! pressure relaxation time
   !      pres0 = 1.013   ! pressure (bar), 1.013 bar/atm
   !   barostat = 1       ! barostat (1=Berendsen, 2=MC)
   ! SHAKE ====================================
           ntc = 2       ! 1=no shake, 2=HX constrained, 3=all constrained
           ntf = 1       ! 1=cpt all bond E, 2=ignore HX bond E, 3=ignore all bond E
   noshakemask=':1'
   ! MISC =====================================
           cut = 10
            ig = -1
         ifqnt = 0
   ! TI =======================================
          icfe = 1 ! interpret groupfile as being a FE simulation; first group is MM, second is QM/MM
       clambda = 0.00 ! Manually set lambda. 0=MM, 1=QM/MM
        ifmbar = 1 ! print the lam=0 and lam=1 energies
   bar_intervall = 50
   bar_l_min   = 0
   bar_l_max   = 1
   bar_l_incr  = 1
        dynlmb = 0 
         ntave = 0
   /

   &qmmm
    qm_theory   = 'AM1D'
        qmmask  = ':1'
      qmcharge  = 0
          spin  = 1
       qmshake  = 0
      qm_ewald  = 1
   qmmm_switch  = 1
     verbosity  = 0
   /


Running the simulations
~~~~~~~~~~~~~~~~~~~~~~~

You should equilibrate each lambda state and perform multiple trials
of production.  The command should be something like:
``OMP_NUM_THREADS=1 mpirun -n 2 sander.MPI -ng 2 -groupfile bookend_0.00.groupfile``.
Replace mpirun with srun, if appropriate.  Use more processors (a multiple
of 2), if necessary.  Most semiempirical methods do not scale well beyond
4 cores, so the most number of cores you'd likely ever use is 8.


Analyzing the results
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The mdout files generated with the provided template mdin file
will contain the values of U(0) and U(1) in the MBAR energy analysis output
that you can use to analyze the free energy.
The potential energy of an arbitrary λ state is directly calculated from
these values: U = λ U(0) + (1-λ) U(1).
The "DVDL" values are similarly calculated: dU(λ)/dλ=U(1)-U(0).


As a convenience, the edgembar-bookend2dats.py script (a part of FE-ToolKit)
reads your mdout files, calculates the appropriate
MBAR energies and DVDL values and writes a series of
`efep_*.dat and dvdl_*.dat data files. <https://rutgerslbsr.gitlab.io/fe-toolkit/edgembar/edgecalc.html#RawData>`__
These data files are the format used by
`edgembar <https://rutgerslbsr.gitlab.io/fe-toolkit/edgembar/>`__
to calculate the free energy,
as discussed in :ref:`Tyk2Analysis`

An example using edgembar-bookend2dats.py follows.
The example reads the MM mdout files (it could be the QM/MM mdout files -- it doesn't matter),
and writes the data files to the LIG~LIGqm/*environment*/t01 directory.

.. code-block::

   edgembar-bookend2dats.py -o LIG~LIGqm/aq/t01 $(ls t01/aq/mm_*.mdout)
   edgembar-bookend2dats.py -o LIG~LIGqm/gas/t01 $(ls t01/gas/mm_*.mdout)


One could prepare edgembar XML input files to correct MM results with the bookend simulation
data by introducing them as an additional
`stage <https://rutgerslbsr.gitlab.io/fe-toolkit/edgembar/introduction.html#Stage>`__;
however, in the above example, I call the MM ligand "LIG" and the QM/MM ligand "LIGqm".
If it were included as an additional stage, then the reported
`edge free energy <https://rutgerslbsr.gitlab.io/fe-toolkit/edgembar/introduction.html#Edge>`__
of each ligand is the QM/MM-corrected value. If it is included as a separately-named ligand, then
one can see the MM and boookend-correction as separate free energy values in the
`graph analysis <https://rutgerslbsr.gitlab.io/fe-toolkit/edgembar/graphana.html>`__.


As specific examples, one way to setup your directory structure for use with edgembar is
shown below, where the "LIG~null" is the transformation from a real ligand to a dummy state,
the environment is either "aq" or "gas", the stage is either "unified" or "bookend",
and the last subdirectory indicated the independent trial.
Be aware that you will need to pay special attention to get the sign of the free energies
correct.  You'll either need to switch the order of the bookend lambda values to negate
the sign of the bookend correction, or your edgembar XML input file will need to reverse
the list of bookend states.

.. code-block::

   ./LIG~null/aq/unified/t01/efep_*_*.dat
   ./LIG~null/aq/unified/t02/efep_*_*.dat
   ./LIG~null/aq/unified/t03/efep_*_*.dat
   ./LIG~null/aq/unified/t04/efep_*_*.dat
   ./LIG~null/aq/bookend/t01/efep_*_*.dat
   ./LIG~null/aq/bookend/t02/efep_*_*.dat
   ./LIG~null/aq/bookend/t03/efep_*_*.dat
   ./LIG~null/aq/bookend/t04/efep_*_*.dat
   ./LIG~null/gas/unified/t01/efep_*_*.dat
   ./LIG~null/gas/unified/t02/efep_*_*.dat
   ./LIG~null/gas/unified/t03/efep_*_*.dat
   ./LIG~null/gas/unified/t04/efep_*_*.dat
   ./LIG~null/gas/bookend/t01/efep_*_*.dat
   ./LIG~null/gas/bookend/t02/efep_*_*.dat
   ./LIG~null/gas/bookend/t03/efep_*_*.dat
   ./LIG~null/gas/bookend/t04/efep_*_*.dat

   
The edgembar XML file would be organized as follows:

.. code-block::

   <?xml version="1.0" ?>
   <edge name="LIG~null">
      <env name="target"> <!-- "aq" -->
         <stage name="unified">
            <trial name="t01">
               <dir>./LIG~null/aq/unified/t01</dir>
               <ene>0.00000000</ene>
               <ene>0.10000000</ene>
               <ene>0.20000000</ene>
               <ene>0.30000000</ene>
               <ene>0.40000000</ene>
               <ene>0.50000000</ene>
               <ene>0.60000000</ene>
               <ene>0.70000000</ene>
               <ene>0.80000000</ene>
               <ene>0.90000000</ene>
               <ene>1.00000000</ene>
            </trial>
	    <! -- Analogous for trials t02 t03 t04 -->
         </stage>
         <stage>
         <!-- reverse the order of the states to negate the energy -->
            <dir>./LIG~null/aq/bookend/t01</dir>
            <ene>1.00000000</ene>
            <ene>0.80000000</ene>
            <ene>0.60000000</ene>
            <ene>0.40000000</ene>
            <ene>0.20000000</ene>
            <ene>0.00000000</ene>
         </stage>
      </env>
      <env name="reference">
         <!-- Analogous for env name="reference" (aka "gas") -->
      </env>
   </edge>


Alternatively, you could organize the directories such that the QM/MM representation of
the ligand has a different ligand name.

.. code-block::

   ./LIG~null/aq/unified/t01/efep_*_*.dat
   ./LIG~null/aq/unified/t02/efep_*_*.dat
   ./LIG~null/aq/unified/t03/efep_*_*.dat
   ./LIG~null/aq/unified/t04/efep_*_*.dat
   ./LIG~LIGqm/aq/bookend/t01/efep_*_*.dat
   ./LIG~LIGqm/aq/bookend/t02/efep_*_*.dat
   ./LIG~LIGqm/aq/bookend/t03/efep_*_*.dat
   ./LIG~LIGqm/aq/bookend/t04/efep_*_*.dat
   ./LIG~null/gas/unified/t01/efep_*_*.dat
   ./LIG~null/gas/unified/t02/efep_*_*.dat
   ./LIG~null/gas/unified/t03/efep_*_*.dat
   ./LIG~null/gas/unified/t04/efep_*_*.dat
   ./LIG~LIGqm/gas/bookend/t01/efep_*_*.dat
   ./LIG~LIGqm/gas/bookend/t02/efep_*_*.dat
   ./LIG~LIGqm/gas/bookend/t03/efep_*_*.dat
   ./LIG~LIGqm/gas/bookend/t04/efep_*_*.dat

   
Edgembar would then have 2 input files:

.. code-block::

   <?xml version="1.0" ?>
   <edge name="LIG~null">
      <env name="target"> <!-- "aq" -->
         <stage name="unified">
            <trial name="t01">
               <dir>./LIG~null/aq/unified/t01</dir>
               <ene>0.00000000</ene>
               <ene>0.10000000</ene>
               <ene>0.20000000</ene>
               <ene>0.30000000</ene>
               <ene>0.40000000</ene>
               <ene>0.50000000</ene>
               <ene>0.60000000</ene>
               <ene>0.70000000</ene>
               <ene>0.80000000</ene>
               <ene>0.90000000</ene>
               <ene>1.00000000</ene>
            </trial>
	    <! -- Analogous for trials t02 t03 t04 -->
         </stage>
      </env>
      <env name="reference">
         <!-- Analogous for env name="reference" (aka "gas") -->
      </env>
   </edge>


.. code-block::

   <?xml version="1.0" ?>
   <edge name="LIG~LIGqm">
      <env name="target"> <!-- "aq" -->
         <stage name="bookend">
            <trial name="t01">
               <dir>./LIG~null/aq/bookend/t01</dir>
               <ene>0.00000000</ene>
               <ene>0.20000000</ene>
               <ene>0.40000000</ene>
               <ene>0.60000000</ene>
               <ene>0.80000000</ene>
               <ene>1.00000000</ene>
            </trial>
	    <! -- Analogous for trials t02 t03 t04 -->
         </stage>
      </env>
      <env name="reference">
         <!-- Analogous for env name="reference" (aka "gas") -->
      </env>
   </edge>

In this case, you don't have to reverse the sign of the bookend states because edgembar aligns the "arrows in the thermodynamic cycle" by the transformation name, e.g., "LIG~LIGqm".  You WOULD need to reverse the states if the transformation name was "LIGqm~LIG".