.. _eqbookend: Calculate a bookend correction from equilibrium sampling ======================================================== | Tutorial Timothy J. Giese\ :sup:`1`, and Darrin M. York\ :sup:`1` | :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute | for Quantitative Biomedicine and Department of Chemistry and Chemical | Biology, Rutgers University, Piscataway, NJ 08854, USA Learning objectives ------------------- .. start-learning-objectives - Use sander to perform MM-to-QM/MM equilibrium simulations - Analyze the simulations to calculate a free energy correction Relevant literature ------------------- - See section "26.1.1. Thermodynamic integration using Sander" in the Amber manual for details on performing equilibrium alchemical free energy simulations with sander. `Amber manual `__ - Giese, T.J.; York, D. M. "Development of a Robust Indirect Approach for MM → QM Free Energy Calculations That Combines Force-Matched Reference Potential and Bennett's Acceptance Ratio Methods" J. Chem. Theory Comput. 15(10), 5543-5562 (2019). Tutorial -------- .. contents:: :local: :depth: 4 Introduction ~~~~~~~~~~~~~~~~~~~~~~~ Suppose you want to calculate a change in free energy via an alchemical pathway using an ab initio or semiempirical method rather than a molecular mechanical force field. Unlike a molecular mechanical force field, one can't directly sample quantum mechanical methods in an alchemical pathway (which may require fractional nuclear charges and fractional numbers of electrons as atoms appear or disappear). To overcome this limitation, one uses an "indirect" reference potential approach, whereby the free energy is calculated with the low-level force field that is fast enough to sample the transformation for an extended period of time with multiple trials, and the high-level QM contribution to the free energy is calculated as a correction to the physical end-states. A depiction of the bookend corrections is shown in the following figure. .. figure:: Cycles.png :scale: 100 % :alt: Absolute solvation (hydration) free energies calculated with MM and QM/MM potentials The left-side of the figure shows a thermodynamic cycle for calculating an absolute solvation free energy using a MM force field. The equation below it is the mathematical representation of the cycle. The "cycle" performs real-to-dummy state transformations in two environments: aqueous phase and gas phase. The direction of the arrows are chosen so the lambda=0 and lambda=1 states are the real and dummy states, respectively. The right-side shows how to calculate the hydration free energy using a QM/MM method. The complicated alchemical sampling is performed with the MM force field and the MM-to-QM/MM transformations correct the real-state free energies. These "bookend" corrections can be run with far less sampling because there are no chemical or alchemical changes. The bookend transformations transform the potential energy from a MM force field to a QM/MM potential. A bookend transformation is required for each "real state". Mathematically, it is given by the following equation .. math:: \begin{equation}\label{E:bookend} \Delta G_{\text{QM/MM}} = \Delta G_{\text{MM}} + \Delta G_{\text{MM-to-QM/MM},\text{tgt}} - \Delta G_{\text{MM-to-QM/MM},\text{ref}} \end{equation} where "tgt" and "ref" denote the target and reference environments. Preparing the input files ~~~~~~~~~~~~~~~~~~~~~~~~~ The bookend free energy simulations are simple, linear transformations U(λ)=U(0)+λ[U(1)-U(0)] performed with sander. The free energy infrastructure in sander is different from pmemd. In general, sander uses a groupfile to read the λ=0 and λ=1 states from 2 cooresponding sets of mdin, rst7, and parm7 files. The groupfile should contain 2 lines. The first line is the MM (λ=0) state. The second is the QM/MM (λ=1) state. A separate groupfile is used for each value of clambda. It is often sufficient to use 5 lambda values: 0.00, 0.20, 0.40, 0.60, 0.80, 1.00. Notice that both states read the exact same input coordinates. .. code-block:: -O -p mm.parm7 -c init.rst7 -i mm_0.00.mdin -o mm_0.00.mdout -r mm_0.00.rst7 -x mm_0.00.nc -inf mm_0.00.mdinfo -O -p qmmm.parm7 -c init.rst7 -i qm_0.00.mdin -o qm_0.00.mdout -r qm_0.00.rst7 -x qm_0.00.nc -inf qm_0.00.mdinfo The mdin file should activate the free energy infrastructure (ifce=1). There are 2 mdin files for each value of clambda (the MM and QM/MM states). An example MM template mdin file is shown below. The QM/MM mdin file should be the same thing as the MM mdin file; however, one sets ifqnt=1 rather than ifqnt=0. Furthermore, the MM and QM/MM simulations both don't need to wite a trajectory file because they both encounter the same sets of coordinates, so one should ntwx=0 in the QM/MM mdin file while setting ntwx to the desired print frequency in the MM mdin file. .. code-block:: Equilibrium TI analysis &cntrl ! IO ======================================= irest = 1 ! 0 = start, 1 = restart ntx = 5 ! 1 = start, 5 = restart ntxo = 1 ! read/write rst as formatted file iwrap = 1 ! wrap crds to unit cell ioutfm = 1 ! write mdcrd as netcdf ntpr = 50 ! mdout print freq ntwx = 0 ! mdcrd print freq ntwr = 500 ! rst print freq ntwv = 0 ! DYNAMICS ================================= ! - - ATTN: a 2 fs timestep only works if the parm7 was modified with hydrogen mass repartitioning - - imin = 0 ! Run dynamics dt = 0.001 ! ps/step nstlim = 10000 ! number of time steps (per simulation) ntb = 1 ! 1=periodic box ! TEMPERATURE ============================== temp0 = 298 ! target temp ! gamma_ln = 5.0 ! Langevin collision freq ! ntt = 3 ! thermostat (3=Langevin) ! Use middle scheme rather than the default leapfrog integrator ! so the middle scheme people dont complain. Supposedly its more stable. ntt = 0 ischeme = 1 ! middle scheme ithermostat = 1 ! Langevin therm_par = 5.0 ! Langevin collision freq ! PRESSURE ================================ ! If we really wanted to run NPT, we'd need to use the Monte Carlo barostat because ! the QM/MM infrastructure doesn't contribute to the virial ntp = 0 !0=off 1=isotropic scaling ! taup = 2.0 ! pressure relaxation time ! pres0 = 1.013 ! pressure (bar), 1.013 bar/atm ! barostat = 1 ! barostat (1=Berendsen, 2=MC) ! SHAKE ==================================== ntc = 2 ! 1=no shake, 2=HX constrained, 3=all constrained ntf = 1 ! 1=cpt all bond E, 2=ignore HX bond E, 3=ignore all bond E noshakemask=':1' ! MISC ===================================== cut = 10 ig = -1 ifqnt = 0 ! TI ======================================= icfe = 1 ! interpret groupfile as being a FE simulation; first group is MM, second is QM/MM clambda = 0.00 ! Manually set lambda. 0=MM, 1=QM/MM ifmbar = 1 ! print the lam=0 and lam=1 energies bar_intervall = 50 bar_l_min = 0 bar_l_max = 1 bar_l_incr = 1 dynlmb = 0 ntave = 0 / &qmmm qm_theory = 'AM1D' qmmask = ':1' qmcharge = 0 spin = 1 qmshake = 0 qm_ewald = 1 qmmm_switch = 1 verbosity = 0 / Running the simulations ~~~~~~~~~~~~~~~~~~~~~~~ You should equilibrate each lambda state and perform multiple trials of production. The command should be something like: ``OMP_NUM_THREADS=1 mpirun -n 2 sander.MPI -ng 2 -groupfile bookend_0.00.groupfile``. Replace mpirun with srun, if appropriate. Use more processors (a multiple of 2), if necessary. Most semiempirical methods do not scale well beyond 4 cores, so the most number of cores you'd likely ever use is 8. Analyzing the results ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The mdout files generated with the provided template mdin file will contain the values of U(0) and U(1) in the MBAR energy analysis output that you can use to analyze the free energy. The potential energy of an arbitrary λ state is directly calculated from these values: U = λ U(0) + (1-λ) U(1). The "DVDL" values are similarly calculated: dU(λ)/dλ=U(1)-U(0). As a convenience, the edgembar-bookend2dats.py script (a part of FE-ToolKit) reads your mdout files, calculates the appropriate MBAR energies and DVDL values and writes a series of `efep_*.dat and dvdl_*.dat data files. `__ These data files are the format used by `edgembar `__ to calculate the free energy, as discussed in :ref:`Tyk2Analysis` An example using edgembar-bookend2dats.py follows. The example reads the MM mdout files (it could be the QM/MM mdout files -- it doesn't matter), and writes the data files to the LIG~LIGqm/*environment*/t01 directory. .. code-block:: edgembar-bookend2dats.py -o LIG~LIGqm/aq/t01 $(ls t01/aq/mm_*.mdout) edgembar-bookend2dats.py -o LIG~LIGqm/gas/t01 $(ls t01/gas/mm_*.mdout) One could prepare edgembar XML input files to correct MM results with the bookend simulation data by introducing them as an additional `stage `__; however, in the above example, I call the MM ligand "LIG" and the QM/MM ligand "LIGqm". If it were included as an additional stage, then the reported `edge free energy `__ of each ligand is the QM/MM-corrected value. If it is included as a separately-named ligand, then one can see the MM and boookend-correction as separate free energy values in the `graph analysis `__. As specific examples, one way to setup your directory structure for use with edgembar is shown below, where the "LIG~null" is the transformation from a real ligand to a dummy state, the environment is either "aq" or "gas", the stage is either "unified" or "bookend", and the last subdirectory indicated the independent trial. Be aware that you will need to pay special attention to get the sign of the free energies correct. You'll either need to switch the order of the bookend lambda values to negate the sign of the bookend correction, or your edgembar XML input file will need to reverse the list of bookend states. .. code-block:: ./LIG~null/aq/unified/t01/efep_*_*.dat ./LIG~null/aq/unified/t02/efep_*_*.dat ./LIG~null/aq/unified/t03/efep_*_*.dat ./LIG~null/aq/unified/t04/efep_*_*.dat ./LIG~null/aq/bookend/t01/efep_*_*.dat ./LIG~null/aq/bookend/t02/efep_*_*.dat ./LIG~null/aq/bookend/t03/efep_*_*.dat ./LIG~null/aq/bookend/t04/efep_*_*.dat ./LIG~null/gas/unified/t01/efep_*_*.dat ./LIG~null/gas/unified/t02/efep_*_*.dat ./LIG~null/gas/unified/t03/efep_*_*.dat ./LIG~null/gas/unified/t04/efep_*_*.dat ./LIG~null/gas/bookend/t01/efep_*_*.dat ./LIG~null/gas/bookend/t02/efep_*_*.dat ./LIG~null/gas/bookend/t03/efep_*_*.dat ./LIG~null/gas/bookend/t04/efep_*_*.dat The edgembar XML file would be organized as follows: .. code-block:: ./LIG~null/aq/unified/t01 0.00000000 0.10000000 0.20000000 0.30000000 0.40000000 0.50000000 0.60000000 0.70000000 0.80000000 0.90000000 1.00000000 ./LIG~null/aq/bookend/t01 1.00000000 0.80000000 0.60000000 0.40000000 0.20000000 0.00000000 Alternatively, you could organize the directories such that the QM/MM representation of the ligand has a different ligand name. .. code-block:: ./LIG~null/aq/unified/t01/efep_*_*.dat ./LIG~null/aq/unified/t02/efep_*_*.dat ./LIG~null/aq/unified/t03/efep_*_*.dat ./LIG~null/aq/unified/t04/efep_*_*.dat ./LIG~LIGqm/aq/bookend/t01/efep_*_*.dat ./LIG~LIGqm/aq/bookend/t02/efep_*_*.dat ./LIG~LIGqm/aq/bookend/t03/efep_*_*.dat ./LIG~LIGqm/aq/bookend/t04/efep_*_*.dat ./LIG~null/gas/unified/t01/efep_*_*.dat ./LIG~null/gas/unified/t02/efep_*_*.dat ./LIG~null/gas/unified/t03/efep_*_*.dat ./LIG~null/gas/unified/t04/efep_*_*.dat ./LIG~LIGqm/gas/bookend/t01/efep_*_*.dat ./LIG~LIGqm/gas/bookend/t02/efep_*_*.dat ./LIG~LIGqm/gas/bookend/t03/efep_*_*.dat ./LIG~LIGqm/gas/bookend/t04/efep_*_*.dat Edgembar would then have 2 input files: .. code-block:: ./LIG~null/aq/unified/t01 0.00000000 0.10000000 0.20000000 0.30000000 0.40000000 0.50000000 0.60000000 0.70000000 0.80000000 0.90000000 1.00000000 .. code-block:: ./LIG~null/aq/bookend/t01 0.00000000 0.20000000 0.40000000 0.60000000 0.80000000 1.00000000 In this case, you don't have to reverse the sign of the bookend states because edgembar aligns the "arrows in the thermodynamic cycle" by the transformation name, e.g., "LIG~LIGqm". You WOULD need to reverse the states if the transformation name was "LIGqm~LIG".