Generating umbrella windows for a 1-dimensional methyl transfer reaction in MTR1

Erika McCarthy¹, Şölen Ekesan¹, and Darrin M. York¹

¹Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA

Learning objectives

Generate umbrella windows for a 1D methyl transfer reaction within MTR1

Relevant literature

Catalytic mechanism and pH dependence of a methyltransferase ribozyme (MTR1) from computational enzymology
Surface-Accelerated String Method for Locating Minimum Free Energy Paths
Extension of the Variational Free Energy Profile and Multistate Bennett Acceptance Ratio Methods for High-Dimensional Potential of Mean Force Profile Analysis

Introduction

The motivation for using umbrella sampling is that, without restraints, a system undergoing dynamics will very rarely cross a reaction barrier. Even if it does, it will be very sparsely sampled and take much longer than we can afford to sample. Therefore, we apply a series of harmonic biasing potentials, also referred to as an umbrella potential, to force the system over a free energy barrier in the space of the selected reaction coordinate. In this way, we can thoroughly sample the breaking and formation of bonds, or other processes like a conformational change. In order to construct a potential of mean force, we must track the value of the reaction coordinate throughout our simulations so that we can later remove the impact of the bias and compute what the distribution of sampling would look like had we not applied the restraint. This tutorial will show you how to perform and analyze QM/MM simulations of a methyl transfer reaction in the MTR1 ribozyme using 1 reaction coordinate.

QM region selection

The active site is shown in Figure 1, and includes the methyl donor ligand (O6 methylated guanine), the nucleophillic target A63, and C10 protonated at the N3 position.

**Figure 1.** Active site of the MTR1 ribozyme. Atoms included in the QM region are represented as CPK. Distances in are those involved in the methyl transfer reaction coordinate with the associated atoms labeled. The labeled atoms involved in the hydrogen bond between C10 and O6mG will be addressed in part 2.

First, we must decide which atoms will be treated quantum mechanically (aka the QM region). The rest of the atoms will be treated with the molecular mechanical force field. In this case, we are simulating methyl transfer from O6 methylated guanine (O6mG) to the N1 of A63, so we want to include both of these residues in the QM region. The positively charged C10 residue is directly hydrogen bonding to the leaving group, so we will include it as well. In part 2 this residue will also be involved in the reaction, so we will proactively include it for part 1. For A63 and C10, we will only include the nucleobases since the backbones are not directly involved in the reaction. Additionally, U45 is important for holding the ligand in the active site, but since it is not directly involved in the reaction it will remain in the MM region. Based on this selection, the total charge of the QM region is +1 because C10 is in a non-standard protonation state.

Based on this QM region selection, one would need to make some modifications to the parameter file such as removing connection atom charges and redistributing MM charges so that the QM region within each residue is an integer. One would also need to perform hydrogen mass repartitioning in order to run QM/MM simulations stably with a 2 fs time step. These modifications are outside the scope of this exercise and have already been performed to produce qmmm.parm7 that is provided to you in the template directory.

The inputs for this section are located in 1D/inputs. They should be copied to your working directory.

[user@cluster] mkdir 1D
[user@cluster] cd 1D
[user@cluster] cp -r /expanse/projects/qstore/amber_ws/tutorials/QMMM_DFTB3_MTR1/1D/inputs/* ./

Preparing the inputs

You have been provided a parameter file (template/qmmm.parm7), and a structure file (start/reactant.rst7) that was previously equilibrated and represent the reactant state of the ribozyme. Additionally, you have been provided a template mdin file (template/short.mdin). This mdin file is valid for QM/MM simulations in general, and is not specific to umbrella sampling.

Take a look at short.mdin:

title
&cntrl
! IO =======================================
      irest = 0       ! 0 = start, 1 = restart
        ntx = 1       ! 1 = start, 5 = restart
       ntxo = 1       ! read/write rst as formatted file
      iwrap = 1       ! wrap crds to unit cell
     ioutfm = 1       ! write mdcrd as netcdf
       imin = 0
      ntmin = 1
       ntpr = 20
       ntwr = 20
       ntwx = 0
       ntwf = 0
! DYNAMICS =================================
     nstlim = 250
         dt = 0.002   ! ps/step
        ntb = 1       ! 1=NVT periodic, 2=NPT periodic, 0=no box
! TEMPERATURE ==============================
      temp0 = 298     ! target temp
   gamma_ln = 5.0     ! Langevin collision freq
        ntt = 3       ! thermostat (3=Langevin)
! PRESSURE  ================================
        ntp = 0       ! 0=no scaling, 1=isotropic, 2=anisotropic
! SHAKE ====================================
        ntc = 2       ! 1=no shake, 2=HX constrained, 3=all constrained
noshakemask = ":69|@272-282,290-291,1975-1988"    ! do not shake these
        ntf = 1       ! 1=cpt all bond E, 2=ignore HX bond E, 3=ignore all bond E
! MISC =====================================
        cut = 10.0
      ifqnt = 1
         ig = -1
     nmropt = 1
/


&ewald
  dsum_tol = 1.e-6
/

&qmmm
    qm_theory   = 'DFTB3'
        qmmask  = ':69|@272-282,290-291,1975-1988'
      qmcharge  = 1
          spin  = 1
       qmshake  = 0
      qm_ewald  = 1
   qmmm_switch  = 1
       scfconv  = 1.e-10
     verbosity  = 0
  tight_p_conv  = 1
  diag_routine  = 0
   pseudo_diag  = 1
  dftb_maxiter  = 100
/


&wt type = 'DUMPFREQ', istep1 = 10, /
&wt type='END' /
DISANG=img.disang
DUMPAVE=img.dumpave
LISTOUT=POUT
LISTIN=POUT

As highlighted in cyan, we will be using a 2 fs timestep, but this can only be done safely when hydrogen mass repartitioning has been applied to the parameter file. The initial simulations used to generate the windows for umbrella sampling will only be performed for 0.5 ps.

As highlighted in yellow, we will be using the DFTB3 semi-empirical Hamiltonian, and the total charge of the QM region is +1 due to the protonation of cytosine. When determining the total charge of a nucleic acid QM region, be sure to consider any negatively charged phosphates that may be included. The QM region is defined by the mask ‘:69|@272-282,290-291,1975-1988’ which includes the entire ligand (:69), the nucleobase of C10 (@272-282,290-291) and the nucleobase of A63 (@1975-1988).

The noshakemask (pink) contains the same mask as the QM region, thus we will not shake QM atoms as these bonds involving hydrogen may fluctuate outside of equilibrium MM values when bonds are made/broken. SHAKE constraints are inherently dependent on the MM force field, thus MM constraints should not be combined with our QM description.

Since we want to apply nmr restraints, nmropt (green) is set to 1, and we will output restraint values to the dumpave file every 10 steps. The green highlighted img.disang and img.dumpave are place holders that will be replaced to indicate the appropriate umbrella window number in order to apply restraints.

Let’s review how nmr restraints work in AMBER. When nmropt=1 is set in the mdin file, restraints will be read from the specified DISANG file. These are different than coordinate restraints set when using ntr=1, which effectively freezes the atoms in place. Using nmr restraints allows the atom position to fluctuate subject to a specified biasing potential relating it to another atom or atoms. The values of the restrained properties for a given step are output to the specified DUMPAVE file with a frequency of istep1.

Determine reaction coordinates

Next we need to select the reaction coordinates. We choose the reaction coordinate to be the difference in distance between the atoms where the bond will break, and between the atoms where the bond will form. This ensures the reaction coordinate value to increase going from reactant to product making it more intuitive. In MTR1, describing methyl transfer between O6mG:O6 and A63:N1 will yield the reaction coordinate (RC): RC = |R_O6mG:O6 - R_Cm| - | R_A63:N1 - R_Cm |. We will evaluate the reaction going from the reactant with RC at -2.5 Å, to product with RC at 2.5 Å. You have been provided template/img.disang containing the restraint definitions.

Take a look at img.disang:

#methylation: |GN-C5|-|A62-C5|
&rst iat= 2200, 2189, 2200, 1984,
r1=-999, r2=RC1, r3=RC1, r4=999, rk2=100, rk3=100, rstwt=1.0,-1.0 /

#Max distances for H-bonds

## 1 ##
&rst iat= 1403, 2207,
r1=-999, r2=-999, r3=3.00, r4=999,rk2=0, rk3=100 /

## 2 ##
&rst iat= 1405, 2192,
r1=-999, r2=-999, r3=2.20, r4=999,rk2=0, rk3=100 /

## 3 ##
&rst iat= 1407, 2206,
r1=-999, r2=-999, r3=2.00, r4=999,rk2=0, rk3=100 /

## 4 ##
&rst iat= 2205, 282,
r1=-999, r2=-999, r3=2.20, r4=999,rk2=0, rk3=100 /

## 5 ##
&rst iat= 280, 2189,
r1=-999, r2=-999, r3=2.20, r4=999,rk2=0, rk3=100 /

## 6 ##
&rst iat= 2190, 1982,
r2=-999, r2=-999, r3=3.00, r4=999,rk2=0, rk3=100 /

## 7 ##
&rst iat= 291, 2193,
r1=-999, r2=-999, r3=2.20, r4=999,rk2=0, rk3=100 /

For now, we need to understand more about the restraints used in this example. The first restraint (highlighted yellow) defines the reaction coordinate. The line “&rst iat= 2200, 2189, 2200, 1984,” indicates that we are restraining the difference in distance between the methyl carbon (2200) to ligand oxygen (2189) and the methyl carbon (2200) to the N1 of adenine (1984). These distances are shown in Figure 1. The presence of “rstwt=1.0,-1.0” indicates this should be a difference of distances. In theory, the two distances could be seperate reaction coordinates, but combining them reduces the dimensionality while still describing the same event. The values of r1, r2, r3, and r4 define the shape of the biasing potential. The r2 and r3 values define the flat region of the potential. If r2 is less than r3, then there is no penalty for being within those bounds. When applying an umbrella potential, we set r2=r3 to sample directly on the desired value. The img.disang file is a template, where RC1 will ultimately be replaced for each umbrella window. If more reaction coordinates were present, the placeholder RC2 and so on would be used. The r1 and r4 values define the bounds of the linear response region, ie the penalty will progressively increase as the restrained property approaches r1 and r4. By setting r1=-999 and r4=999 we have effectively created a parabola with infinite bounds. The values rk2 and rk3 are the weights of the restraints. These effectively define the steepness of the parabola, thus controlling the penalty for going outside the r2 to r3 region. For umbrella sampling we want to set this to a high number to force the sampling of high energy regions. The higher the free energy at an umbrella center position, the more the observed value (reported in the dumpave file) will attempt to move away from this center and fight against the biasing potential. It is this concept that helps us construct a PMF.

The rest of the restraints do not define reaction coordinates, and they will be held constant in all of the umbrella windows. These are distance restraints on the active site hydrogen bonds to the ligand to ensure it stays in the pocket. As highlighted in green, these are different from the harmonic potential used for the reaction coordinate because r2=-999 and rk2=0. This defines a half-harmonic biasing potential where any distance less than 3.0 feels no penalty. If the values were reversed (ie. r3=999 and rk2=100), then the two atoms would be pushed away from each other. Now that we have a comprehensive understand of the nmr restraints, we will focus on setting up an umbrella sampling simulation.

Generate the equilibration directory

Now we will generate the equil directory based on the inputs provided. This will be where we sequentially generate the umbrella windows starting from the reactant structure. It is important to slowly increment the reaction coordinate starting from a neighboring structure because each time you change the restraint, you are effectively changing the force field and risk large spikes in energy that could cause unintended bonds to break. We will use the ndfes program to write the inputs for this step.

Navigate to the start directory.

You have been provided a reactant and product.disang file where the reaction coordinate is set to -2.5 and 2.5 respectively. These will define the end points, and a total of 32 umbrella windows will be generated to linearly interpolate through them.

Make sure the amber module is loaded.

Run the following command:

[user@cluster] ndfes-path-prepguess.py --disang ../template/img.disang --mdin ../template/short.mdin --min reactant.disang --min product.disang --nsim 32 --odir ../equil --pad 2 --dry-run

The –min flags indicate the minima we want to use as control points (reactant and product). The option –disang indicates the template disang file with a placeholder, –mdin indicates the template mdin file with a placeholder, –nsim determines the number of umbrella windows, and –pad means the file names will be padded with a leading zero. The –dry-run flag will just print the output without creating the simulation directory called equil. The output should look like this:

  -2.500   ../equil/img01.disang
  -2.339   ../equil/img02.disang
  -2.177   ../equil/img03.disang
  -2.016   ../equil/img04.disang
  -1.855   ../equil/img05.disang
  -1.694   ../equil/img06.disang
  -1.532   ../equil/img07.disang
  -1.371   ../equil/img08.disang
  -1.210   ../equil/img09.disang
  -1.048   ../equil/img10.disang
  -0.887   ../equil/img11.disang
  -0.726   ../equil/img12.disang
  -0.565   ../equil/img13.disang
  -0.403   ../equil/img14.disang
  -0.242   ../equil/img15.disang
  -0.081   ../equil/img16.disang
   0.081   ../equil/img17.disang
   0.242   ../equil/img18.disang
   0.403   ../equil/img19.disang
   0.565   ../equil/img20.disang
   0.726   ../equil/img21.disang
   0.887   ../equil/img22.disang
   1.048   ../equil/img23.disang
   1.210   ../equil/img24.disang
   1.371   ../equil/img25.disang
   1.532   ../equil/img26.disang
   1.694   ../equil/img27.disang
   1.855   ../equil/img28.disang
   2.016   ../equil/img29.disang
   2.177   ../equil/img30.disang
   2.339   ../equil/img31.disang
   2.500   ../equil/img32.disang
dmax     0.161

dmax is the spacing between umbrella windows. In general, if the spacing is too larger, consider adding more windows. For our purposes this is sufficiently small.

[user@cluster] ndfes-path-prepguess.py --disang ../template/img.disang --mdin ../template/short.mdin --min reactant.disang --min product.disang --nsim 32 --odir ../equil --pad 2

Generate the umbrella windows

List the contents of the equil directory:

[user@cluster] cd ../equil
[user@cluster] ls
eq01fwd.sh    img04.disang  img07.mdin    img11.disang  img14.mdin    img18.disang  img21.mdin    img25.disang  img28.mdin    img32.disang
img01.disang  img04.mdin    img08.disang  img11.mdin    img15.disang  img18.mdin    img22.disang  img25.mdin    img29.disang  img32.mdin
img01.mdin    img05.disang  img08.mdin    img12.disang  img15.mdin    img19.disang  img22.mdin    img26.disang  img29.mdin    init01.rst7
img02.disang  img05.mdin    img09.disang  img12.mdin    img16.disang  img19.mdin    img23.disang  img26.mdin    img30.disang
img02.mdin    img06.disang  img09.mdin    img13.disang  img16.mdin    img20.disang  img23.mdin    img27.disang  img30.mdin
img03.disang  img06.mdin    img10.disang  img13.mdin    img17.disang  img20.mdin    img24.disang  img27.mdin    img31.disang
img03.mdin    img07.disang  img10.mdin    img14.disang  img17.mdin    img21.disang  img24.mdin    img28.disang  img31.mdin

You will see that the eq01fwd.sh script was generated for you in the ouputs directory.

Take a look at eq01fwd.sh:

#!/bin/bash
set -e
set -u

#
# You can create a slurm script and run these
# commands in the following way:
#
# export LAUNCH="srun sander.MPI"
# export PARM="path/to/parm7"
# bash eq01fwd.sh
#

if [ "${LAUNCH}" == "" ]; then
    echo 'bash variable LAUNCH is undefined. Defaulting to: export LAUNCH="sander"'
    export LAUNCH="sander"
fi

if [ "${PARM}" == "" ]; then
    echo 'bash variable PARM is undefined. Please: export PARM="/path/to/parm7"'
    exit 1
else
   if [ ! -e "${PARM}" ]; then
       echo "File not found: ${PARM}"
       exit 1
   fi
fi

if [ ! -e "init01.rst7" ]; then
    echo "File not found: init01.rst7"
    exit 1
fi

${LAUNCH} -O -p ${PARM} -i img01.mdin -o img01.mdout -c init01.rst7 -r img01.rst7 -x img01.nc -inf img01.mdinfo
${LAUNCH} -O -p ${PARM} -i img02.mdin -o img02.mdout -c img01.rst7 -r img02.rst7 -x img02.nc -inf img02.mdinfo
${LAUNCH} -O -p ${PARM} -i img03.mdin -o img03.mdout -c img02.rst7 -r img03.rst7 -x img03.nc -inf img03.mdinfo
${LAUNCH} -O -p ${PARM} -i img04.mdin -o img04.mdout -c img03.rst7 -r img04.rst7 -x img04.nc -inf img04.mdinfo
${LAUNCH} -O -p ${PARM} -i img05.mdin -o img05.mdout -c img04.rst7 -r img05.rst7 -x img05.nc -inf img05.mdinfo
${LAUNCH} -O -p ${PARM} -i img06.mdin -o img06.mdout -c img05.rst7 -r img06.rst7 -x img06.nc -inf img06.mdinfo
${LAUNCH} -O -p ${PARM} -i img07.mdin -o img07.mdout -c img06.rst7 -r img07.rst7 -x img07.nc -inf img07.mdinfo
${LAUNCH} -O -p ${PARM} -i img08.mdin -o img08.mdout -c img07.rst7 -r img08.rst7 -x img08.nc -inf img08.mdinfo
${LAUNCH} -O -p ${PARM} -i img09.mdin -o img09.mdout -c img08.rst7 -r img09.rst7 -x img09.nc -inf img09.mdinfo
${LAUNCH} -O -p ${PARM} -i img10.mdin -o img10.mdout -c img09.rst7 -r img10.rst7 -x img10.nc -inf img10.mdinfo
${LAUNCH} -O -p ${PARM} -i img11.mdin -o img11.mdout -c img10.rst7 -r img11.rst7 -x img11.nc -inf img11.mdinfo
${LAUNCH} -O -p ${PARM} -i img12.mdin -o img12.mdout -c img11.rst7 -r img12.rst7 -x img12.nc -inf img12.mdinfo
${LAUNCH} -O -p ${PARM} -i img13.mdin -o img13.mdout -c img12.rst7 -r img13.rst7 -x img13.nc -inf img13.mdinfo
${LAUNCH} -O -p ${PARM} -i img14.mdin -o img14.mdout -c img13.rst7 -r img14.rst7 -x img14.nc -inf img14.mdinfo
${LAUNCH} -O -p ${PARM} -i img15.mdin -o img15.mdout -c img14.rst7 -r img15.rst7 -x img15.nc -inf img15.mdinfo
${LAUNCH} -O -p ${PARM} -i img16.mdin -o img16.mdout -c img15.rst7 -r img16.rst7 -x img16.nc -inf img16.mdinfo
${LAUNCH} -O -p ${PARM} -i img17.mdin -o img17.mdout -c img16.rst7 -r img17.rst7 -x img17.nc -inf img17.mdinfo
${LAUNCH} -O -p ${PARM} -i img18.mdin -o img18.mdout -c img17.rst7 -r img18.rst7 -x img18.nc -inf img18.mdinfo
${LAUNCH} -O -p ${PARM} -i img19.mdin -o img19.mdout -c img18.rst7 -r img19.rst7 -x img19.nc -inf img19.mdinfo
${LAUNCH} -O -p ${PARM} -i img20.mdin -o img20.mdout -c img19.rst7 -r img20.rst7 -x img20.nc -inf img20.mdinfo
${LAUNCH} -O -p ${PARM} -i img21.mdin -o img21.mdout -c img20.rst7 -r img21.rst7 -x img21.nc -inf img21.mdinfo
${LAUNCH} -O -p ${PARM} -i img22.mdin -o img22.mdout -c img21.rst7 -r img22.rst7 -x img22.nc -inf img22.mdinfo
${LAUNCH} -O -p ${PARM} -i img23.mdin -o img23.mdout -c img22.rst7 -r img23.rst7 -x img23.nc -inf img23.mdinfo
${LAUNCH} -O -p ${PARM} -i img24.mdin -o img24.mdout -c img23.rst7 -r img24.rst7 -x img24.nc -inf img24.mdinfo
${LAUNCH} -O -p ${PARM} -i img25.mdin -o img25.mdout -c img24.rst7 -r img25.rst7 -x img25.nc -inf img25.mdinfo
${LAUNCH} -O -p ${PARM} -i img26.mdin -o img26.mdout -c img25.rst7 -r img26.rst7 -x img26.nc -inf img26.mdinfo
${LAUNCH} -O -p ${PARM} -i img27.mdin -o img27.mdout -c img26.rst7 -r img27.rst7 -x img27.nc -inf img27.mdinfo
${LAUNCH} -O -p ${PARM} -i img28.mdin -o img28.mdout -c img27.rst7 -r img28.rst7 -x img28.nc -inf img28.mdinfo
${LAUNCH} -O -p ${PARM} -i img29.mdin -o img29.mdout -c img28.rst7 -r img29.rst7 -x img29.nc -inf img29.mdinfo
${LAUNCH} -O -p ${PARM} -i img30.mdin -o img30.mdout -c img29.rst7 -r img30.rst7 -x img30.nc -inf img30.mdinfo
${LAUNCH} -O -p ${PARM} -i img31.mdin -o img31.mdout -c img30.rst7 -r img31.rst7 -x img31.nc -inf img31.mdinfo
${LAUNCH} -O -p ${PARM} -i img32.mdin -o img32.mdout -c img31.rst7 -r img32.rst7 -x img32.nc -inf img32.mdinfo

This script would start from img01 and iteratively generate the umbrella windows using the output of the previous window as the starting structure. It is important to increment the starting structure slowly to avoid errors. One can uses this script to write a slurm script for the desired cluster if necessary.