Alchemical free energy calculations of pKa shifts in the MTR1 ribozyme
======================================================================

| Erika McCarthy\ :sup:`1`, Şölen Ekesan\ :sup:`1`, and Darrin M.
  York\ :sup:`1`
| :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute
  for Quantitative Biomedicine and Department of Chemistry and Chemical
  Biology, Rutgers University, Piscataway, NJ 08854, USA

Learning objectives
-------------------

-  Use alchemical free energy simulations to estimate a pKa shift
-  Utilize enhanced sampling method ACES to sample through a conformational barrier
-  Predict an activity-pH profile


Relevant literature
-------------------

-  `Catalytic mechanism and pH dependence of a methyltransferase ribozyme (MTR1) from computational enzymology <https://academic.oup.com/nar/article/51/9/4508/7127216>`__
-  `ACES: Optimized Alchemically Enhanced Sampling <https://pubs.acs.org/doi/10.1021/acs.jctc.2c00697>`__
-  `Improvements in Precision of Relative Binding Free Energy Calculations Afforded by the Alchemical Enhanced Sampling (ACES) Approach￼ <https://pubs.acs.org/doi/10.1021/acs.jcim.4c00464>`__

Introduction
------------

In this tutorial you will learn how to compute a pKa shift using alchemical free energy simulations. Given that a pka is associated with a free energy of deprotonation, one may observe the pKa shift induced by environmental changes similarly to the ΔΔG of a protein binding two different ligands in relative binding free energy (RBFE) calculations. Given an experimental reference pKa (expt), a pKa perturbation may be computed from the free energy difference of deprotonating the species A in both an aqueous (aq) and enzyme complex (com) environment. 

.. math::

   pK_{a,A} = pK^{expt}_{a,A} + \Delta pK_{a,A}

.. math::

   \Delta pK_{a,A} = \frac{\Delta G_{com}(AH^+ \rightarrow A) - \Delta G_{aq}(AH^+ \rightarrow A)}{RT \ln(10)}


As a case study, we will observe the pKa shift of a cytosine (C10) and an adenine (A63) residue in the active site of a methyl transferase ribozyme (MTR1) compared to in solution. These residues form a ligand binding pocket and will be referred to here as respectively. The active form of MTR1 requires C10 protonation and neutral adenine in order to bind the ligand O6mG (purple) as depicted in Figure 1.

.. container::

   .. figure:: /_static/files/TutorialsInProgress/pKa/pKafig1.png
      :alt: Figure 1

      **Figure 1.** Active site of MTR1 with C10 (green) protonated at N3 and A63 (pink) neutral (not protonated at N1). Sites whose protonation is expected to influence the rate are circled in red.


Given that these 2 residues may protonate in any combination, we will consider a 4 state model that yields the thermodynamic cycles shown as in Figure 2. From each ΔΔG, ΔpKa may be computed and added to the reference pKa of 4.2 for C or 3.5 for A.


.. container::

   .. figure:: /_static/files/TutorialsInProgress/pKa/ThermoCyclepKa.png
      :alt: Figure 2
      :scale: 50

      **Figure 2.** Thermodynamic cycles demonstrating the difference in free energy of deprotonation of C10 (C) and A63 (A) in an aqueous versus complexed environment (ΔΔG). Red arrows indicate free energies that will be explicitly computed in order to address the edges indicated with black arrows.


Tutorial
--------

This tutorial will begin with a structure of MTR1 that has already been subjected to an equilibration procedure. It will also employ the alchemical enhanced sampling method (ACES), additional tutorial material for which is given in section 7.2: Advanced Thermodynamic Integration (TI) Methods to Mutate a Ligand Bound to a Protein. 

.. contents::
   :local:
   :depth: 4

.. start-tutorial


1. System setup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To form the cycles shown in Figure 2, 6 alchemical free energy simulations will be conducted corresponding to the red edges (2 of the aqueous edges are redundant). The procedure here is adopted from the FE_Workflow for RBFE calculations, where the “binding” event is the protonation of A or C. The transformation ACH+🠂AC will be used as an example, and the subsequent legs may be computed similarly. 


1.1 Creating the complex state
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

First download the initial files and extract them into your desired working directory:

.. warning::

  Add a download link for the files once it is available.

.. code-block:: bash

  user@local:~/YOUR/WORKING/DIR $ ls
  Files.zip
  user@local:~/YOUR/WORKING/DIR $ unzip Files.zip
  user@local:~/YOUR/WORKING/DIR $ cd Files
  user@local:~/YOUR/WORKING/DIR/Files/RUN_NoAces/ACH~AC $ ls
  input  common  merge.sh
  user@local:~/YOUR/WORKING/DIR/Files/RUN_NoAces/ACH~AC $ ls common
  A1.off  AC.pdb  ACH.pdb  CX.frcmod  CX.lib  GUN.frcmod  GUN.lib  UX5.off

The “common” directory contains two pdb files, ACH.pdb and AC.pdb, which correspond to the two endstates for this simulation. They differ by a single atom, C10:H3, the proton in question and have already been solvated with TIP4Pew water and physiological concentration of NaCl. Both pdb files contain CX for the purpose of the workflow, thus C9 was changed to CX9 in AC.pdb. The directory also contains the parameter files for the non-standard residues. When C is protonated we will need CX.lib and CX.frcmod and when A is protonated we will need A1.off. In addition, the non-covalent ligand is a non-standard residue with the parameter files GUN.lib and GUN.frcmod. The script merge.sh is provided here as a convenience and requires system specific modification. The following variables are adjusted for the different edges of the cycle:

.. code-block:: bash

    resprot='CX'      ######## Name of the protonated residue

    resunprot='C'     ######## Name of the unprotonated residue

    prefixprot=ACH    ######## Name of the pdb file corresponding to the protonated residue

    prefixunprot=AC   ######## Name of the pdb file corresponding to the unprotonated residue

    resnum=9          ######## Residue number of the base for pKa calculation

    H="H3"            ######## Name of atom that will disappear for pKa calculation

    lignum=69         ######## Ligand residue number (GUN in this case)

    sys=com           ######## com stands for complex (ie. the MTR complexed system)
          

The goal is to create two merged topologies (stateA.rst7 and stateB.rst7) that originate from the two supplied endstate pdb files (ACH.pdb and AC.pdb). An essential step is the Parmed timerge command. In this example using merge.sh, the input trajectory and parameters files for Parmed contain residues 1-69 (the copy of the RNA from ACH.pdb), residues 70-138 (the copy of the RNA from AC.pdb), and solvent. In this case, since there is a non-covalent ligand (residue 69) which is not being perturbed, the timerge command is:

.. code-block:: bash

    timerge ":1-69” “:70-138” “:9” “:78” “:69” “:138”

 
Under this numbering scheme, residue 9 corresponds to C10 in the MTR1 structure and naming scheme, and residue 78 is its copy. This will create a second copy of C10 in the merged topology.


After verifying that the variables are set correctly, run merge.sh. This will create the directory called “com”. The output of the script is also provided for your convienience.

.. code-block:: bash

  user@local:~/YOUR/WORKING/DIR/Files/RUN_NoAces/ACH~AC $ ls com
  TEMPLATE.sh  inputs  prod_1.slurm  run_1.slurm  stateA.rst7  stateB.rst7  unisc.parm7


TEMPLATE.sh has also been provided as a convenience for generating input files. It should be copied to the “com” directory you have just generated. The key inputs are the lambda schedule and the atoms that will be put in the TI (thermal integration) and sc (softcore) regions. We will use a lambda schedule with 13 lambda states that sample the second order smooth-step function. Notable inputs in TEMPLATE.sh are:


.. code-block:: bash

  lams=(0.00000000 0.22976400 0.30269700 0.35943600 0.40913000 0.45531800 0.50000000 0.54468200 0.59087000 0.64056400 0.69730300 0.77023600 1.00000000)

  twostate=true

  env=com

  timask1     = '@260-291'
  timask2     = '@2189-2219'
  scmask1     = '@291'
  scmask2      = ''

“lams” is the selected lambda schedule. “twostate=true” will employ a 2 state model, where equilibration occurs in parallel at each endstate and works inward to the centermost lambda state. “env” indicates the environment for the calculation, and here we use “com” as short for “complex”. “timask1” contains the atoms of the first copy of C10 in its protonated form (residue 9 in the merged topology), “timask2” contains the atoms in the second copy of C10 (residue 70 in the merged topology), “scmask1” contains C10:H3, and “scmask2” is empty because C10:H3 is disappearing and not present in the second copy.

Another important input in the mdin template is

.. code-block:: bash

    gti_add_sc    = 25


which invokes the use of ACES. This is a bit trivial in this case given the softcore region is simply a hydrogen, but we will see a more impactful use case later on. This is a replica exchange method that is accelerated by use of parallel computation on GPUs. Running TEMPLATE.sh in the “com” directory will generate a directory called inputs that will contain all of the necessary mdin files and groupfiles that will be called by the slurm script.

.. code-block:: bash

  user@local:~/YOUR/WORKING/DIR/Files/RUN_NoAces/ACH~AC $ cd com
  user@local:~/YOUR/WORKING/DIR/Files/RUN_NoAces/ACH~AC/com $ bash TEMPLATE.sh
  user@local:~/YOUR/WORKING/DIR/Files/RUN_NoAces/ACH~AC/com $ ls inputs
  0.00000000_analyze.mdin          0.30269700_ti.mdin          0.54468200_minTI.mdin       1.00000000_eqNTP4.mdin.template
  0.00000000_eqA.mdin              0.35943600_analyze.mdin     0.54468200_preTI.mdin       1.00000000_eqP.mdin
  0.00000000_eqATI.mdin            0.35943600_eqATI.mdin       0.54468200_ti.mdin          1.00000000_eqP0.mdin
  0.00000000_eqNTP4.mdin.template  0.35943600_eqP0TI.mdin      0.59087000_analyze.mdin     1.00000000_eqP0TI.mdin
  0.00000000_eqP.mdin              0.35943600_eqpre1P0TI.mdin  0.59087000_eqATI.mdin       1.00000000_eqProt0.mdin
  0.00000000_eqP0.mdin             0.35943600_eqpre2P0TI.mdin  0.59087000_eqP0TI.mdin      1.00000000_eqProt01.mdin
  0.00000000_eqP0TI.mdin           0.35943600_minTI.mdin       0.59087000_eqpre1P0TI.mdin  1.00000000_eqProt025.mdin
  0.00000000_eqProt0.mdin          0.35943600_preTI.mdin       0.59087000_eqpre2P0TI.mdin  1.00000000_eqProt05.mdin
  0.00000000_eqProt01.mdin         0.35943600_ti.mdin          0.59087000_minTI.mdin       1.00000000_eqProt1.mdin
  0.00000000_eqProt025.mdin        0.40913000_analyze.mdin     0.59087000_preTI.mdin       1.00000000_eqProt2.mdin
  0.00000000_eqProt05.mdin         0.40913000_eqATI.mdin       0.59087000_ti.mdin          1.00000000_eqV.mdin
  0.00000000_eqProt1.mdin          0.40913000_eqP0TI.mdin      0.64056400_analyze.mdin     1.00000000_eqpre1P0.mdin
  0.00000000_eqProt2.mdin          0.40913000_eqpre1P0TI.mdin  0.64056400_eqATI.mdin       1.00000000_eqpre1P0TI.mdin
  0.00000000_eqV.mdin              0.40913000_eqpre2P0TI.mdin  0.64056400_eqP0TI.mdin      1.00000000_eqpre2P0.mdin
  0.00000000_eqpre1P0.mdin         0.40913000_minTI.mdin       0.64056400_eqpre1P0TI.mdin  1.00000000_eqpre2P0TI.mdin
  0.00000000_eqpre1P0TI.mdin       0.40913000_preTI.mdin       0.64056400_eqpre2P0TI.mdin  1.00000000_min1.mdin
  0.00000000_eqpre2P0.mdin         0.40913000_ti.mdin          0.64056400_minTI.mdin       1.00000000_min2.mdin
  0.00000000_eqpre2P0TI.mdin       0.45531800_analyze.mdin     0.64056400_preTI.mdin       1.00000000_minTI.mdin
  0.00000000_min1.mdin             0.45531800_eqATI.mdin       0.64056400_ti.mdin          1.00000000_preTI.mdin
  0.00000000_min2.mdin             0.45531800_eqP0TI.mdin      0.69730300_analyze.mdin     1.00000000_ti.mdin
  0.00000000_minTI.mdin            0.45531800_eqpre1P0TI.mdin  0.69730300_eqATI.mdin       eqA.groupfile
  0.00000000_preTI.mdin            0.45531800_eqpre2P0TI.mdin  0.69730300_eqP0TI.mdin      eqATI.groupfile
  0.00000000_ti.mdin               0.45531800_minTI.mdin       0.69730300_eqpre1P0TI.mdin  eqNTP4.groupfile
  0.22976400_analyze.mdin          0.45531800_preTI.mdin       0.69730300_eqpre2P0TI.mdin  eqP.groupfile
  0.22976400_eqATI.mdin            0.45531800_ti.mdin          0.69730300_minTI.mdin       eqP0.groupfile
  0.22976400_eqP0TI.mdin           0.50000000_analyze.mdin     0.69730300_preTI.mdin       eqProt0.groupfile
  0.22976400_eqpre1P0TI.mdin       0.50000000_eqATI.mdin       0.69730300_ti.mdin          eqProt01.groupfile
  0.22976400_eqpre2P0TI.mdin       0.50000000_eqP0TI.mdin      0.77023600_analyze.mdin     eqProt025.groupfile
  0.22976400_minTI.mdin            0.50000000_eqpre1P0TI.mdin  0.77023600_eqATI.mdin       eqProt05.groupfile
  0.22976400_preTI.mdin            0.50000000_eqpre2P0TI.mdin  0.77023600_eqP0TI.mdin      eqProt1.groupfile
  0.22976400_ti.mdin               0.50000000_minTI.mdin       0.77023600_eqpre1P0TI.mdin  eqProt2.groupfile
  0.30269700_analyze.mdin          0.50000000_preTI.mdin       0.77023600_eqpre2P0TI.mdin  eqV.groupfile
  0.30269700_eqATI.mdin            0.50000000_ti.mdin          0.77023600_minTI.mdin       eqpre1P0.groupfile
  0.30269700_eqP0TI.mdin           0.54468200_analyze.mdin     0.77023600_preTI.mdin       eqpre2P0.groupfile
  0.30269700_eqpre1P0TI.mdin       0.54468200_eqATI.mdin       0.77023600_ti.mdin          preTI.groupfile
  0.30269700_eqpre2P0TI.mdin       0.54468200_eqP0TI.mdin      1.00000000_analyze.mdin     ti.groupfile
  0.30269700_minTI.mdin            0.54468200_eqpre1P0TI.mdin  1.00000000_eqA.mdin
  0.30269700_preTI.mdin            0.54468200_eqpre2P0TI.mdin  1.00000000_eqATI.mdin


The run_1.slurm and prod_1.slurm scripts are provided as examples and will need to be tailored to your compute environment. Copy them to the “com” directory. In particular, the #SBATCH commands at the top of the file must be adjusted. Hamiltonian replica exchange will also be used (HREMD) which requires the -rem 3 flag when running production simulations as seen in the example prod_1.slurm script.


.. code-block:: bash

  user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces/ACH~AC $ cat prod.slurm
  #!/bin/bash
  #SBATCH --job-name="ACH~ACprod_1.slurm"
  #SBATCH --output="prod_1.slurmout"
  #SBATCH --error="prod_1.slurmerr"
  #SBATCH --partition=rtx
  #SBATCH --nodes=1
  #SBATCH --ntasks-per-node=12
  #SBATCH --time=2-00:00:00

  source /work/08588/emccart3/frontera/bashrc_12-24-2022

  path=`pwd`

  lams=(0.00000000 0.22976400 0.30269700 0.35943600 0.40913000 0.45531800 0.50000000 0.54468200 0.59087000 0.64056400 0.69730300 0.77023600 1.00000000)

  # check if AMBERHOME is set
  if [ -z "${AMBERHOME}" ]; then echo "AMBERHOME is not set" && exit 0; fi

  trial=1

  EXE=${AMBERHOME}/bin/pmemd.cuda_SPFP.MPI

  echo "running replica ti"
  ibrun -np ${#lams[@]} ${EXE} -rem 3 -remlog remt${trial} -ng ${#lams[@]} -groupfile inputs/t${trial}_ti.groupfile
    

1.2 Creating the aqueous state
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We must now set up the aqueous reference leg for this simulation. In RBFE simulations, this would be one ligand transforming into another in solvent. Here we will use a 3-mer reference with C10 and its flanking residues in the 3' and 5' direction to make the reference more experimentally relevant. The same setup procedure will be used for the aqueous state, also referred to as the reference state. A new common directory must now contain CH+.pdb, C.pdb, CX.lib, and CX.frcmod, and the variables shown previously must be adjusted in merge.sh.

  .. code-block:: bash

    user@local:~/YOUR/WORKING/DIR/FILES/RUN_reference $ ls 
    TEMPLATE.sh  common   input   merge.sh
    user@local:~/YOUR/WORKING/DIR/FILES/RUN_reference $ head input
    resprot='CX'
    resunprot='C'
    prefixprot=CH+
    prefixunprot=C
    resnum=2
    H="H3"
    sys=aq


In accordance with FE_Workflow, run_1.slurm for the aqueous leg does not contain the eqProt2, eqProt1, eqProt05, eqProt025, eqProt01, and eqProt0 steps that gradually release positional restraints due to the simplicity of the system. These changes are reflected in TEMPLATE.sh. The selection of the TI and SC masks is analogous to the complex leg.


  .. code-block:: bash

    TI1='@33-64'
    TI2='@100-130'
    SC1='@64'
    SC2=' '

.. _RunAFE:

1. Running the AFE Simulations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now that you have set up your simulations, transfer them to your HPC. We will start with the ACH~AC transformation. You will require GPU and MPI capabilities. For more details, see section 14 of Advanced Thermodynamic Integration (TI) Methods to Mutate a Ligand Bound to a Protein. At this point, your directory structure should look like this:


.. code-block:: bash

  user@HPC:~/YOUR/WORKING/DIR/FILES/RUN_NoAces/ACH~AC/com $ ls
  inputs  run_1.slurm   prod_1.slurm   stateA.rst7   stateB.rst7   unisc.parm7  t1
  user@HPC:~/YOUR/WORKING/DIR/FILES/RUN_NoAces/ACH~AC/com $ ls t1
  0.00000000_init.rst7  1.00000000_init.rst7

Now you may submit your jobs with

.. code-block:: bash 

  sbatch run_1.slurm 
  sbatch --dependency=afterok:XXXX prod_1.slurm (where XXXX is the run_1.slurm job id)


where XXXX is the job ID assigned to your run_1.slurm job. Repeat this for your aqueous state (CH+~C). Wait for the simulations to finish. The production stage should generate a remt1.log file when the -rem 3 flag is used during production. This procedure would be repeated for the other 3 transformations to form a thermodynamic cycle.


.. _analysis:

3. Analyzing the results
~~~~~~~~~~~~~~~~~~~~~~~~~

Transfer the .mdout files from the TI stage (\*ti.mdout) to your local computer. You have been provided the results for the other legs of the cycle in the RUN_NoAces directory, as well as script for analysis. Your directory structure should now contain the subdirectories ACH~AC, AHCH~AHC, AHC~AC, AHCH~ACH.

.. code-block:: bash 

  user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ ls
  ACH~AC  AHCH~ACH  AHCH~AHC  AHC~AC  edgembar-example-writeinput.py  makedats.sh linkref.sh

You will need the edgembar program to perform network wide free energy analysis. This is available with `FE-Toolkit <https://gitlab.com/RutgersLBSR/fe-toolkit>`__

Each transformation requires an aqueous reference transformation, which are CH+~C, and AH+~A. Since there are 2 complex transformations for each reference, I will symbolically link them to my analysis directory intead of copying or moving (although you can do this if your file system does not support symbollic links) by running the following commands from the BASE directory:

.. code-block:: bash 

  user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ cat linkref.sh
  #!/bin/bash

  path2A=~/YOUR/WORKING/DIR/FILES/RUN_reference/AH+~A/aq
  path2C=~/YOUR/WORKING/DIR/FILES/RUN_reference/CH+~C/aq

  top=`pwd`

  for i in AHCH~ACH AHC~AC; do
      cd ${i}
      ln -s ${path2A} aq
      cd ${top}
  done

  for i in AHCH~AHC ACH~AC; do
      cd ${i}
      ln -s ${path2C} aq
      cd ${top}
  done

  user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ bash linkref.sh

Be sure to edit the working directory to reflect your path.  You will now have this structure:

.. code-block:: bash 

  user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ ls ACH~AC
  aq  com

Now you must extract thermodynamic data from your mdout files. The script edgembar-amber2dats.py is available in the FE_Toolkit. The following makedats.sh script may be used to run edgembar-amber2dats.py for each of your transformations from the BASE directory:

.. code-block:: bash 

  user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ cat makedats.sh
  #!/bin/bash

  top=`pwd`
  for i in ACH~AC AHC~AC AHCH~AHC AHCH~ACH; do
      cd ${i}/com/t1
      edgembar-amber2dats.py -o ./dats $(ls ./*ti.mdout)
      cd ../../
      cd aq/t1
      edgembar-amber2dats.py -o ./dats $(ls ./*ti.mdout)
      cd ${top}
  done

  user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ bash makedats.sh

This will create a directory called dats in each of the t1 directories containing efep*.dat and dvdl*.dat.

Now you must generate .xml files for each of your transformations that specify the directory structure. This may be aided by using the edgembar-example-writeinput.py script. This can be copied from edgembar/examples/DiscoverEdges and is provided. To tailor the script to your directory structure, modify the “s” and “edges” variables to reflect your path. In this example I have called my target environment “com” and my reference environment “aq” to represent complex and aqueous:

.. code-block:: python

  #!/usr/bin/env python3

  from edgembar import DiscoverEdges
  import os
  from pathlib import Path

  #
  # The output directory (where the edge xml input files are to be written
  #
  odir = Path("xml")

  #
  # The format string describing the directory structure.
  # The {edge} {env} {stage} {trial} {traj} {ene} placeholders are used
  # to extract substrings from the path; only the {edge} {traj} and {ene}
  # are absolutely required.  The {env} placeholder must be either 
  # 'target' or 'reference', or you must supply the directory string
  # with the target and reference optional arguments.
  # If the {env} placeholder is missing, then
  # 'complex' is assumed.
  #
  # Full example:
  #    s = r"dats/{trial}/free_energy/{edge}_ambest/{env}/{stage}/efep_{traj}_{ene}.dat"
  # Minimal example:
  #    s = r"dats/{edge}/efep_{traj}_{ene}.dat"

  s = r"~/YOUR/WORKING/DIR/FILES/RUN_NoAces/{edge}/{env}/t{trial}/dats/efep_{traj}_{ene}.dat"
  exclusions=None
  # exclusions=["trial1"]
  edges = DiscoverEdges(s,
                        target="com",
                        reference="aq",
                        exclude_trials=exclusions)

  # In some instances, one may have computed a stage with lambda values
  # going in reverse order relative to the thermodynamic path that leads
  # from the reactants to the products. We can reverse the order of the
  # files to effectively negate the free energy of each state (essentially
  # treating the lambda 0 state as the lambda 1 state).
  #
  #for edge in edges:
  #    for trial in edge.GetAllTrials():
  #        if trial.stage.name == "STAGE":
  #            trial.reverse()

  if not odir.is_dir():
      os.makedirs(odir)

  for edge in edges:
      fname = odir / (edge.name + ".xml")
      edge.WriteXml( fname )


Here I will not choose to exclude any trials; however, if you performed multiple trials and one failed, you could specify to exclude it from analysis. Now run the script:

.. code-block:: bash
  
  user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ python edgembar-example-writeinput.py
 
A new directory called xml will be generated containing an xml file for each transformation. For example, ACH~AC.xml looks like this:

.. code-block:: bash
  
  user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ cat xml/ACH~AC.xml
  <edge name="ACH~AC">
    <env name="target">
      <stage name="STAGE">
        <trial name="1">
          <dir>/home/erika/dat2/MTR_pKa_tutorial/Simplified_workflow/RUN_NoAces/ACH~AC/com/t1/dats</dir>
          <ene>0.00000000</ene>
          <ene>0.22976400</ene>
          <ene>0.30269700</ene>
          <ene>0.35943600</ene>
          <ene>0.40913000</ene>
          <ene>0.45531800</ene>
          <ene>0.50000000</ene>
          <ene>0.54468200</ene>
          <ene>0.59087000</ene>
          <ene>0.64056400</ene>
          <ene>0.69730300</ene>
          <ene>0.77023600</ene>
          <ene>1.00000000</ene>
        </trial>
      </stage>
    </env>
    <env name="reference">
      <stage name="STAGE">
        <trial name="1">
          <dir>/home/erika/dat2/MTR_pKa_tutorial/Simplified_workflow/RUN_NoAces/ACH~AC/aq/t1/dats</dir>
          <ene>0.00000000</ene>
          <ene>0.22976400</ene>
          <ene>0.30269700</ene>
          <ene>0.35943600</ene>
          <ene>0.40913000</ene>
          <ene>0.45531800</ene>
          <ene>0.50000000</ene>
          <ene>0.54468200</ene>
          <ene>0.59087000</ene>
          <ene>0.64056400</ene>
          <ene>0.69730300</ene>
          <ene>0.77023600</ene>
          <ene>1.00000000</ene>
        </trial>
      </stage>
    </env>
  </edge>       

Check that the xml files satisfy your directory structure and have the correct lambda schedule. Now, in the xml directory, run edgembar_omp for each xml file. This will generate a python file, that we will run to generate the final html output. After all python files have been generated, we will use them to create a combined graph analysis:

.. code-block:: bash

  user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ cd xml
  user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces/xml $ for i in *.xml; do edgembar_omp ${i}; python ${i%.xml}.py; done
  user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces/xml $ edgembar-WriteGraphHtml.py *py -o graph.html

Let's start with ACH~AC.html. Open the html file in a browser to see results and analysis pertaining to this edge transformation. The following sections provide valuable information:


.. container::

   .. figure:: /_static/files/TutorialsInProgress/pKa/ACH~AC_NoAces.png
      :alt: Figure 3


ΔΔG gives the free energy difference of protonating the residue in question in the MTR1 complex versus aqueous environment. ΔΔG may be converted to pKa units by dividing by RTln(10) ≈ 1.363 kcal/mol. Finally, it may be added to the reference pKa to get an estimated pKa value in the MTR1 complex. Here, the pKa we calculate for C10 assuming A63 is unprotonated is

.. math::

  pKa(C10_{MTR1}) \approx 4.2 + \frac{3.169 kcal/mol}{1.363 kcal/mol} = 6.5


We also see that 3 of the lambda windows have not reached convergence. This may indicate that simulations should be run longer for more accurate results. Now let's look at the full graph.

.. container::

   .. figure:: /_static/files/TutorialsInProgress/pKa/Graph_noAces.png
      :alt: Figure 4


The ΔΔG of each transformation as computed using the MBAR method is given in the UFE column. The AvgCC column shows that the cycle is not perfectly closed, indicating some error that caused the endstates to not be entirely identical after being simulated as part of two different edges. One is strongly encouraged to visualize each of the simulated end state structures when interpreting the free energy results.    


4. Applying the ACES method
~~~~~~~~~~~~~~~~~~~~~~~~~~~

If we look at representative structures of the ACH+ and AC endstates in the ACH+~AC transformation shown in Figure 5, we see that the ligand (O6mG) OCH3 group is rotable in response to changes in protonation. To enhance the sampling through the barrier of this rotation, we will apply the ACES method to the OCH3 group. 


.. container::

   .. figure:: /_static/files/TutorialsInProgress/pKa/ACH~AC_structure.png
      :alt: Figure 5

      **Figure 5.** Representative structures of the ACH+ and AC endstates in the ACH+🠂AC transformation. 

We will repeat the simulation set-up and running procedures from the previous section with some key modifications. Navigate to the RUN_Aces directory. We will again use the ACH+~AC leg as an example.  

.. code-block:: bash

  user@local:~/YOUR/WORKING/DIR/FILES/RUN_Aces $ ls
  ACH~AC AHCH~ACH AHCH~AHC AHC~AC linkref.sh makedats.sh edgembar-example-writeinput.py
  user@local:~/YOUR/WORKING/DIR/FILES/RUN_Aces $ cd ACH~AC
  user@local:~/YOUR/WORKING/DIR/FILES/RUN_Aces/ACH~AC $ ls
  TEMPLATE.sh  aq  common  input  merge.sh


First you will need to modify the TI merge command to include the ligand in both TI regions. The timerge command in merge.sh is changed to:

.. code-block:: bash

  timerge ":${res1}-${res2}" ":${res3}-${res4}" ":${resnum},${lignum}" ":${copynum},${copylig}" ":${lignum}"

which will run the Parmed command 

.. code-block:: bash

  timerge ":1-69” “:70-138” “:9,69” “:78,138” “:69” .

This will create two copies of the ligand, but we will not transform any of its atoms. Run the merge.sh script.

.. code-block:: bash

    user@local:~/YOUR/WORKING/DIR/FILES/RUN_Aces/ACH~AC $ bash merge.sh

The TEMPLATE.sh script must be modified to put the ligand in the TI regions and the OCH3 group in the softcore regions:

.. code-block:: bash

  TI1='@260-291,2220-2238'
  TI2='@2189-2219,2239-2257'
  SC1='@291,2220,2231,2233-2235'
  SC2='@2239,2250,2252-2254'

  gtiaddsc=25

Setting gti_add_sc=25 in the mdin files enables ACES. Run the TEMPLATE.sh script. 

.. code-block:: bash

    user@local:~/YOUR/WORKING/DIR/FILES/RUN_Aces/ACH~AC $ bash TEMPLATE.sh

Transfer the files to your HPC, run the jobs, and download the results as you did in section 2. Obtain the results of the other legs and repeat the graph analysis as shown in section 3. The final graph.html should look like this:


.. container::

   .. figure:: /_static/files/TutorialsInProgress/pKa/Graph_ACES.png
      :alt: Figure 6


5. Comparing results with and without ACES
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's look at the free energies and associated pKa values with (Aces:H,OCH3) and without (ACES:H) the ACES method applied to the ligand OCH3 group. The data from the graph.html reports are compiled in Table 1.


.. .. table:: Comparison of pKa values obtain with and without applying ACES to OCH3 
..   +--------------+----------------+--------+-----------+------------+
..   | Transformation | UFE (kcal/mol) | $\Delta\text{pKa}$ | pKa (ref) | pKa (MTR1) |
..   +==============+================+========+===========+============+
..   | \multicolumn{5}{|c|}{ACES:H} |
..   +--------------+----------------+--------+-----------+------------+
..   | ACH~AC       | 3.169          | 2.325  | 4.2       | 6.5        |
..   +--------------+----------------+--------+-----------+------------+
..   | AHCH~ACH     | 2.515          | 1.845  | 3.5       | 5.3        |
..   +--------------+----------------+--------+-----------+------------+
..   | AHCH~AHC     | -0.109         | -0.080 | 4.2       | 4.1        |
..   +--------------+----------------+--------+-----------+------------+
..   | AHC~AC       | 5.178          | 3.799  | 3.5       | 7.3        |
..   +--------------+----------------+--------+-----------+------------+
..   | \multicolumn{5}{|c|}{Aces:H,$\text{OCH}_3$} |
..   +--------------+----------------+--------+-----------+------------+
..   | ACH~AC       | 3.054          | 2.241  | 4.2       | 6.4        |
..   +--------------+----------------+--------+-----------+------------+
..   | AHCH~ACH     | -0.793         | -0.582 | 3.5       | 2.9        |
..   +--------------+----------------+--------+-----------+------------+
..   | AHCH~AHC     | -2.276         | -1.670 | 4.2       | 2.5        |
..   +--------------+----------------+--------+-----------+------------+
..   | AHC~AC       | 5.668          | 4.158  | 3.5       | 7.7        |
..   +--------------+----------------+--------+-----------+------------+


.. table:: Table 1. Comparison of pKa values obtained with and without applying ACES to OCH3

   +----------------+----------------+----------------+-----------+------------+
   | Transformation | UFE (kcal/mol) | ΔpKa           | pKa (ref) | pKa (MTR1) |
   +================+================+================+===========+============+
   | **ACES:H**                                                                |
   +----------------+----------------+----------------+-----------+------------+
   | ACH~AC         | 3.169          | 2.325          | 4.2       | 6.5        |
   +----------------+----------------+----------------+-----------+------------+
   | AHCH~ACH       | 2.515          | 1.845          | 3.5       | 5.3        |
   +----------------+----------------+----------------+-----------+------------+
   | AHCH~AHC       | -0.109         | -0.080         | 4.2       | 4.1        |
   +----------------+----------------+----------------+-----------+------------+
   | AHC~AC         | 5.178          | 3.799          | 3.5       | 7.3        |
   +----------------+----------------+----------------+-----------+------------+
   | **ACES:H, OCH3**                                                          | 
   +----------------+----------------+----------------+-----------+------------+
   | ACH~AC         | 3.054          | 2.241          | 4.2       | 6.4        |
   +----------------+----------------+----------------+-----------+------------+
   | AHCH~ACH       | -0.793         | -0.582         | 3.5       | 2.9        |
   +----------------+----------------+----------------+-----------+------------+
   | AHCH~AHC       | -2.276         | -1.670         | 4.2       | 2.5        |
   +----------------+----------------+----------------+-----------+------------+
   | AHC~AC         | 5.668          | 4.158          | 3.5       | 7.7        |
   +----------------+----------------+----------------+-----------+------------+


We see that the AHCH~ACH and AHCH~AHC transformations are more favorable when applying ACES to the ligand compared to our first trial. Looking at the trajectories of the AHCH state with and without ACES applied to the ligand OCH3 group shows that ACES allowed us to sample a wider distribution of OCH3 angles, whereas our first trial saw A63 pushed away from the ligand without the ability to recover and explore the same conformational degrees of freedom. 

.. container::

   .. figure:: /_static/files/TutorialsInProgress/pKa/OCH3_rot_ACES.png
      :alt: Figure 7

      **Figure 7.** Representative structure of the AHCH endstate (a) without applying ACES to the ligand OCH3 group and (b) after applying ACES to the ligand OCH3 group.

There are two ways we could interpret our results. First, we could assume that protonation events occur independently and have no free energy contribution from coupling to another protonation event. This would suggest the following reaction scheme:

.. container::

   .. figure:: /_static/files/TutorialsInProgress/pKa/Deprot_uncoupled.png
      :alt: Figure 8

Using the ACES treatment of the ligand OCH3 results, this would yield pKa values of 2.9 and 6.4 for A63 and C10 respectively by converting the free energies of transformation to pKa units and adding this perturbation to the reference pKa. 

Second, we could consider coupled protonation events, suggesting that any of the four protonation states in our cycle may occur with an associated probability. This may be described as a four state partition function (Q) based on the grand canonical ensemble in which the number of hydrogens (Ni) is allowed to fluctuate between 0, 1, or 2. The observed rate (kobs) depends on Q, the intrinsic rate (kint), and the fraction of active species (factive). In this case, the active state that catalyzes a reaction is ACH, and the intrinsic rate will be taken from experiment.   

.. math::

  k_{obs} = \frac{k_{int}\cdot f_{active}(pH)}{Q(pH)}

.. math::

  Q(pH) = \sum_{i=1}^{4}f_{i}(pH) = f_{AHCH} + f_{AHC} + f_{ACH} + f_{AC}

The fraction of each species depends on the free energy associated with each node of the thermodynamic cycle (Gi) and the chemical potential (μ(pH)). 

.. math::
  f_{i} = e^{{-\frac{\beta}{N_{a}}(G_{i}-\mu(pH)\cdot N_{i})}}

.. math::
  \mu(pH) = \frac{-pH\cdot N_{a}\cdot \ln(10)}{\beta}

Experimentally, the formulation from which pKa values and intrinsic rate are fit is

.. math::
  k_{obs} = k_{int} / (1+ 10^{pKa1-pH} + 10^{pH-pKa2} + 10^{pKa2-pKa1})

where pKa1 is the lower pKa and pKa2 is the higher pKa. The four state model may be fit to the experimental model by optimizing kint, pKa1, and pKa2 are parameters. The following is what one would obtain for a fit using our calculated free energies in a 4 state model (black) compared to experiment (red). 

.. container::

   .. figure:: /_static/files/TutorialsInProgress/pKa/MTR_pH_tutorial.png
      :alt: Figure 9

      **Figure 9.** Experimental activity-pH profile of MTR1 (red) and calculated activity-pH model using the 4 model (black). pH values measured experimentally are indicated with points.  

The pKa values that we fit are not in agreeance with the experimental data, suggesting that MTR1 does not follow the model we have proposed here with coupled protonation events between C10 and A63. 

Going back to our first model, the C10 pKa of 6.4 that we predict is close to the experimental value of 6.2, and experimental evidence suggests that this is a reasonable assignment. However, the A63 pka of 2.9 that we predict is quite low compared to 5.0. Therefore, we may hypothesize that the experimental pKa of 6.2 originates from C10, and the pKa of 5.0 originates from an event that is independent of C10 protonation, or perhaps a single pKa model would also fit the data.  


.. end-tutorial