Alchemical free energy calculations of pKa shifts in the MTR1 ribozyme ====================================================================== | Erika McCarthy\ :sup:`1`, Şölen Ekesan\ :sup:`1`, and Darrin M. York\ :sup:`1` | :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA Learning objectives ------------------- - Use alchemical free energy simulations to estimate a pKa shift - Utilize enhanced sampling method ACES to sample through a conformational barrier - Predict an activity-pH profile Relevant literature ------------------- - `Catalytic mechanism and pH dependence of a methyltransferase ribozyme (MTR1) from computational enzymology `__ - `ACES: Optimized Alchemically Enhanced Sampling `__ - `Improvements in Precision of Relative Binding Free Energy Calculations Afforded by the Alchemical Enhanced Sampling (ACES) Approach `__ Introduction ------------ In this tutorial you will learn how to compute a pKa shift using alchemical free energy simulations. Given that a pka is associated with a free energy of deprotonation, one may observe the pKa shift induced by environmental changes similarly to the ΔΔG of a protein binding two different ligands in relative binding free energy (RBFE) calculations. Given an experimental reference pKa (expt), a pKa perturbation may be computed from the free energy difference of deprotonating the species A in both an aqueous (aq) and enzyme complex (com) environment. .. math:: pK_{a,A} = pK^{expt}_{a,A} + \Delta pK_{a,A} .. math:: \Delta pK_{a,A} = \frac{\Delta G_{com}(AH^+ \rightarrow A) - \Delta G_{aq}(AH^+ \rightarrow A)}{RT \ln(10)} As a case study, we will observe the pKa shift of a cytosine (C10) and an adenine (A63) residue in the active site of a methyl transferase ribozyme (MTR1) compared to in solution. These residues form a ligand binding pocket and will be referred to here as respectively. The active form of MTR1 requires C10 protonation and neutral adenine in order to bind the ligand O6mG (purple) as depicted in Figure 1. .. container:: .. figure:: /_static/files/TutorialsInProgress/pKa/pKafig1.png :alt: Figure 1 **Figure 1.** Active site of MTR1 with C10 (green) protonated at N3 and A63 (pink) neutral (not protonated at N1). Sites whose protonation is expected to influence the rate are circled in red. Given that these 2 residues may protonate in any combination, we will consider a 4 state model that yields the thermodynamic cycles shown as in Figure 2. From each ΔΔG, ΔpKa may be computed and added to the reference pKa of 4.2 for C or 3.5 for A. .. container:: .. figure:: /_static/files/TutorialsInProgress/pKa/ThermoCyclepKa.png :alt: Figure 2 :scale: 50 **Figure 2.** Thermodynamic cycles demonstrating the difference in free energy of deprotonation of C10 (C) and A63 (A) in an aqueous versus complexed environment (ΔΔG). Red arrows indicate free energies that will be explicitly computed in order to address the edges indicated with black arrows. Tutorial -------- This tutorial will begin with a structure of MTR1 that has already been subjected to an equilibration procedure. It will also employ the alchemical enhanced sampling method (ACES), additional tutorial material for which is given in section 7.2: Advanced Thermodynamic Integration (TI) Methods to Mutate a Ligand Bound to a Protein. .. contents:: :local: :depth: 4 .. start-tutorial 1. System setup ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To form the cycles shown in Figure 2, 6 alchemical free energy simulations will be conducted corresponding to the red edges (2 of the aqueous edges are redundant). The procedure here is adopted from the FE_Workflow for RBFE calculations, where the “binding” event is the protonation of A or C. The transformation ACH+🠂AC will be used as an example, and the subsequent legs may be computed similarly. 1.1 Creating the complex state ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ First download the initial files and extract them into your desired working directory: .. warning:: Add a download link for the files once it is available. .. code-block:: bash user@local:~/YOUR/WORKING/DIR $ ls Files.zip user@local:~/YOUR/WORKING/DIR $ unzip Files.zip user@local:~/YOUR/WORKING/DIR $ cd Files user@local:~/YOUR/WORKING/DIR/Files/RUN_NoAces/ACH~AC $ ls input common merge.sh user@local:~/YOUR/WORKING/DIR/Files/RUN_NoAces/ACH~AC $ ls common A1.off AC.pdb ACH.pdb CX.frcmod CX.lib GUN.frcmod GUN.lib UX5.off The “common” directory contains two pdb files, ACH.pdb and AC.pdb, which correspond to the two endstates for this simulation. They differ by a single atom, C10:H3, the proton in question and have already been solvated with TIP4Pew water and physiological concentration of NaCl. Both pdb files contain CX for the purpose of the workflow, thus C9 was changed to CX9 in AC.pdb. The directory also contains the parameter files for the non-standard residues. When C is protonated we will need CX.lib and CX.frcmod and when A is protonated we will need A1.off. In addition, the non-covalent ligand is a non-standard residue with the parameter files GUN.lib and GUN.frcmod. The script merge.sh is provided here as a convenience and requires system specific modification. The following variables are adjusted for the different edges of the cycle: .. code-block:: bash resprot='CX' ######## Name of the protonated residue resunprot='C' ######## Name of the unprotonated residue prefixprot=ACH ######## Name of the pdb file corresponding to the protonated residue prefixunprot=AC ######## Name of the pdb file corresponding to the unprotonated residue resnum=9 ######## Residue number of the base for pKa calculation H="H3" ######## Name of atom that will disappear for pKa calculation lignum=69 ######## Ligand residue number (GUN in this case) sys=com ######## com stands for complex (ie. the MTR complexed system) The goal is to create two merged topologies (stateA.rst7 and stateB.rst7) that originate from the two supplied endstate pdb files (ACH.pdb and AC.pdb). An essential step is the Parmed timerge command. In this example using merge.sh, the input trajectory and parameters files for Parmed contain residues 1-69 (the copy of the RNA from ACH.pdb), residues 70-138 (the copy of the RNA from AC.pdb), and solvent. In this case, since there is a non-covalent ligand (residue 69) which is not being perturbed, the timerge command is: .. code-block:: bash timerge ":1-69” “:70-138” “:9” “:78” “:69” “:138” Under this numbering scheme, residue 9 corresponds to C10 in the MTR1 structure and naming scheme, and residue 78 is its copy. This will create a second copy of C10 in the merged topology. After verifying that the variables are set correctly, run merge.sh. This will create the directory called “com”. The output of the script is also provided for your convienience. .. code-block:: bash user@local:~/YOUR/WORKING/DIR/Files/RUN_NoAces/ACH~AC $ ls com TEMPLATE.sh inputs prod_1.slurm run_1.slurm stateA.rst7 stateB.rst7 unisc.parm7 TEMPLATE.sh has also been provided as a convenience for generating input files. It should be copied to the “com” directory you have just generated. The key inputs are the lambda schedule and the atoms that will be put in the TI (thermal integration) and sc (softcore) regions. We will use a lambda schedule with 13 lambda states that sample the second order smooth-step function. Notable inputs in TEMPLATE.sh are: .. code-block:: bash lams=(0.00000000 0.22976400 0.30269700 0.35943600 0.40913000 0.45531800 0.50000000 0.54468200 0.59087000 0.64056400 0.69730300 0.77023600 1.00000000) twostate=true env=com timask1 = '@260-291' timask2 = '@2189-2219' scmask1 = '@291' scmask2 = '' “lams” is the selected lambda schedule. “twostate=true” will employ a 2 state model, where equilibration occurs in parallel at each endstate and works inward to the centermost lambda state. “env” indicates the environment for the calculation, and here we use “com” as short for “complex”. “timask1” contains the atoms of the first copy of C10 in its protonated form (residue 9 in the merged topology), “timask2” contains the atoms in the second copy of C10 (residue 70 in the merged topology), “scmask1” contains C10:H3, and “scmask2” is empty because C10:H3 is disappearing and not present in the second copy. Another important input in the mdin template is .. code-block:: bash gti_add_sc = 25 which invokes the use of ACES. This is a bit trivial in this case given the softcore region is simply a hydrogen, but we will see a more impactful use case later on. This is a replica exchange method that is accelerated by use of parallel computation on GPUs. Running TEMPLATE.sh in the “com” directory will generate a directory called inputs that will contain all of the necessary mdin files and groupfiles that will be called by the slurm script. .. code-block:: bash user@local:~/YOUR/WORKING/DIR/Files/RUN_NoAces/ACH~AC $ cd com user@local:~/YOUR/WORKING/DIR/Files/RUN_NoAces/ACH~AC/com $ bash TEMPLATE.sh user@local:~/YOUR/WORKING/DIR/Files/RUN_NoAces/ACH~AC/com $ ls inputs 0.00000000_analyze.mdin 0.30269700_ti.mdin 0.54468200_minTI.mdin 1.00000000_eqNTP4.mdin.template 0.00000000_eqA.mdin 0.35943600_analyze.mdin 0.54468200_preTI.mdin 1.00000000_eqP.mdin 0.00000000_eqATI.mdin 0.35943600_eqATI.mdin 0.54468200_ti.mdin 1.00000000_eqP0.mdin 0.00000000_eqNTP4.mdin.template 0.35943600_eqP0TI.mdin 0.59087000_analyze.mdin 1.00000000_eqP0TI.mdin 0.00000000_eqP.mdin 0.35943600_eqpre1P0TI.mdin 0.59087000_eqATI.mdin 1.00000000_eqProt0.mdin 0.00000000_eqP0.mdin 0.35943600_eqpre2P0TI.mdin 0.59087000_eqP0TI.mdin 1.00000000_eqProt01.mdin 0.00000000_eqP0TI.mdin 0.35943600_minTI.mdin 0.59087000_eqpre1P0TI.mdin 1.00000000_eqProt025.mdin 0.00000000_eqProt0.mdin 0.35943600_preTI.mdin 0.59087000_eqpre2P0TI.mdin 1.00000000_eqProt05.mdin 0.00000000_eqProt01.mdin 0.35943600_ti.mdin 0.59087000_minTI.mdin 1.00000000_eqProt1.mdin 0.00000000_eqProt025.mdin 0.40913000_analyze.mdin 0.59087000_preTI.mdin 1.00000000_eqProt2.mdin 0.00000000_eqProt05.mdin 0.40913000_eqATI.mdin 0.59087000_ti.mdin 1.00000000_eqV.mdin 0.00000000_eqProt1.mdin 0.40913000_eqP0TI.mdin 0.64056400_analyze.mdin 1.00000000_eqpre1P0.mdin 0.00000000_eqProt2.mdin 0.40913000_eqpre1P0TI.mdin 0.64056400_eqATI.mdin 1.00000000_eqpre1P0TI.mdin 0.00000000_eqV.mdin 0.40913000_eqpre2P0TI.mdin 0.64056400_eqP0TI.mdin 1.00000000_eqpre2P0.mdin 0.00000000_eqpre1P0.mdin 0.40913000_minTI.mdin 0.64056400_eqpre1P0TI.mdin 1.00000000_eqpre2P0TI.mdin 0.00000000_eqpre1P0TI.mdin 0.40913000_preTI.mdin 0.64056400_eqpre2P0TI.mdin 1.00000000_min1.mdin 0.00000000_eqpre2P0.mdin 0.40913000_ti.mdin 0.64056400_minTI.mdin 1.00000000_min2.mdin 0.00000000_eqpre2P0TI.mdin 0.45531800_analyze.mdin 0.64056400_preTI.mdin 1.00000000_minTI.mdin 0.00000000_min1.mdin 0.45531800_eqATI.mdin 0.64056400_ti.mdin 1.00000000_preTI.mdin 0.00000000_min2.mdin 0.45531800_eqP0TI.mdin 0.69730300_analyze.mdin 1.00000000_ti.mdin 0.00000000_minTI.mdin 0.45531800_eqpre1P0TI.mdin 0.69730300_eqATI.mdin eqA.groupfile 0.00000000_preTI.mdin 0.45531800_eqpre2P0TI.mdin 0.69730300_eqP0TI.mdin eqATI.groupfile 0.00000000_ti.mdin 0.45531800_minTI.mdin 0.69730300_eqpre1P0TI.mdin eqNTP4.groupfile 0.22976400_analyze.mdin 0.45531800_preTI.mdin 0.69730300_eqpre2P0TI.mdin eqP.groupfile 0.22976400_eqATI.mdin 0.45531800_ti.mdin 0.69730300_minTI.mdin eqP0.groupfile 0.22976400_eqP0TI.mdin 0.50000000_analyze.mdin 0.69730300_preTI.mdin eqProt0.groupfile 0.22976400_eqpre1P0TI.mdin 0.50000000_eqATI.mdin 0.69730300_ti.mdin eqProt01.groupfile 0.22976400_eqpre2P0TI.mdin 0.50000000_eqP0TI.mdin 0.77023600_analyze.mdin eqProt025.groupfile 0.22976400_minTI.mdin 0.50000000_eqpre1P0TI.mdin 0.77023600_eqATI.mdin eqProt05.groupfile 0.22976400_preTI.mdin 0.50000000_eqpre2P0TI.mdin 0.77023600_eqP0TI.mdin eqProt1.groupfile 0.22976400_ti.mdin 0.50000000_minTI.mdin 0.77023600_eqpre1P0TI.mdin eqProt2.groupfile 0.30269700_analyze.mdin 0.50000000_preTI.mdin 0.77023600_eqpre2P0TI.mdin eqV.groupfile 0.30269700_eqATI.mdin 0.50000000_ti.mdin 0.77023600_minTI.mdin eqpre1P0.groupfile 0.30269700_eqP0TI.mdin 0.54468200_analyze.mdin 0.77023600_preTI.mdin eqpre2P0.groupfile 0.30269700_eqpre1P0TI.mdin 0.54468200_eqATI.mdin 0.77023600_ti.mdin preTI.groupfile 0.30269700_eqpre2P0TI.mdin 0.54468200_eqP0TI.mdin 1.00000000_analyze.mdin ti.groupfile 0.30269700_minTI.mdin 0.54468200_eqpre1P0TI.mdin 1.00000000_eqA.mdin 0.30269700_preTI.mdin 0.54468200_eqpre2P0TI.mdin 1.00000000_eqATI.mdin The run_1.slurm and prod_1.slurm scripts are provided as examples and will need to be tailored to your compute environment. Copy them to the “com” directory. In particular, the #SBATCH commands at the top of the file must be adjusted. Hamiltonian replica exchange will also be used (HREMD) which requires the -rem 3 flag when running production simulations as seen in the example prod_1.slurm script. .. code-block:: bash user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces/ACH~AC $ cat prod.slurm #!/bin/bash #SBATCH --job-name="ACH~ACprod_1.slurm" #SBATCH --output="prod_1.slurmout" #SBATCH --error="prod_1.slurmerr" #SBATCH --partition=rtx #SBATCH --nodes=1 #SBATCH --ntasks-per-node=12 #SBATCH --time=2-00:00:00 source /work/08588/emccart3/frontera/bashrc_12-24-2022 path=`pwd` lams=(0.00000000 0.22976400 0.30269700 0.35943600 0.40913000 0.45531800 0.50000000 0.54468200 0.59087000 0.64056400 0.69730300 0.77023600 1.00000000) # check if AMBERHOME is set if [ -z "${AMBERHOME}" ]; then echo "AMBERHOME is not set" && exit 0; fi trial=1 EXE=${AMBERHOME}/bin/pmemd.cuda_SPFP.MPI echo "running replica ti" ibrun -np ${#lams[@]} ${EXE} -rem 3 -remlog remt${trial} -ng ${#lams[@]} -groupfile inputs/t${trial}_ti.groupfile 1.2 Creating the aqueous state ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We must now set up the aqueous reference leg for this simulation. In RBFE simulations, this would be one ligand transforming into another in solvent. Here we will use a 3-mer reference with C10 and its flanking residues in the 3' and 5' direction to make the reference more experimentally relevant. The same setup procedure will be used for the aqueous state, also referred to as the reference state. A new common directory must now contain CH+.pdb, C.pdb, CX.lib, and CX.frcmod, and the variables shown previously must be adjusted in merge.sh. .. code-block:: bash user@local:~/YOUR/WORKING/DIR/FILES/RUN_reference $ ls TEMPLATE.sh common input merge.sh user@local:~/YOUR/WORKING/DIR/FILES/RUN_reference $ head input resprot='CX' resunprot='C' prefixprot=CH+ prefixunprot=C resnum=2 H="H3" sys=aq In accordance with FE_Workflow, run_1.slurm for the aqueous leg does not contain the eqProt2, eqProt1, eqProt05, eqProt025, eqProt01, and eqProt0 steps that gradually release positional restraints due to the simplicity of the system. These changes are reflected in TEMPLATE.sh. The selection of the TI and SC masks is analogous to the complex leg. .. code-block:: bash TI1='@33-64' TI2='@100-130' SC1='@64' SC2=' ' .. _RunAFE: 1. Running the AFE Simulations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Now that you have set up your simulations, transfer them to your HPC. We will start with the ACH~AC transformation. You will require GPU and MPI capabilities. For more details, see section 14 of Advanced Thermodynamic Integration (TI) Methods to Mutate a Ligand Bound to a Protein. At this point, your directory structure should look like this: .. code-block:: bash user@HPC:~/YOUR/WORKING/DIR/FILES/RUN_NoAces/ACH~AC/com $ ls inputs run_1.slurm prod_1.slurm stateA.rst7 stateB.rst7 unisc.parm7 t1 user@HPC:~/YOUR/WORKING/DIR/FILES/RUN_NoAces/ACH~AC/com $ ls t1 0.00000000_init.rst7 1.00000000_init.rst7 Now you may submit your jobs with .. code-block:: bash sbatch run_1.slurm sbatch --dependency=afterok:XXXX prod_1.slurm (where XXXX is the run_1.slurm job id) where XXXX is the job ID assigned to your run_1.slurm job. Repeat this for your aqueous state (CH+~C). Wait for the simulations to finish. The production stage should generate a remt1.log file when the -rem 3 flag is used during production. This procedure would be repeated for the other 3 transformations to form a thermodynamic cycle. .. _analysis: 3. Analyzing the results ~~~~~~~~~~~~~~~~~~~~~~~~~ Transfer the .mdout files from the TI stage (\*ti.mdout) to your local computer. You have been provided the results for the other legs of the cycle in the RUN_NoAces directory, as well as script for analysis. Your directory structure should now contain the subdirectories ACH~AC, AHCH~AHC, AHC~AC, AHCH~ACH. .. code-block:: bash user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ ls ACH~AC AHCH~ACH AHCH~AHC AHC~AC edgembar-example-writeinput.py makedats.sh linkref.sh You will need the edgembar program to perform network wide free energy analysis. This is available with `FE-Toolkit `__ Each transformation requires an aqueous reference transformation, which are CH+~C, and AH+~A. Since there are 2 complex transformations for each reference, I will symbolically link them to my analysis directory intead of copying or moving (although you can do this if your file system does not support symbollic links) by running the following commands from the BASE directory: .. code-block:: bash user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ cat linkref.sh #!/bin/bash path2A=~/YOUR/WORKING/DIR/FILES/RUN_reference/AH+~A/aq path2C=~/YOUR/WORKING/DIR/FILES/RUN_reference/CH+~C/aq top=`pwd` for i in AHCH~ACH AHC~AC; do cd ${i} ln -s ${path2A} aq cd ${top} done for i in AHCH~AHC ACH~AC; do cd ${i} ln -s ${path2C} aq cd ${top} done user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ bash linkref.sh Be sure to edit the working directory to reflect your path. You will now have this structure: .. code-block:: bash user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ ls ACH~AC aq com Now you must extract thermodynamic data from your mdout files. The script edgembar-amber2dats.py is available in the FE_Toolkit. The following makedats.sh script may be used to run edgembar-amber2dats.py for each of your transformations from the BASE directory: .. code-block:: bash user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ cat makedats.sh #!/bin/bash top=`pwd` for i in ACH~AC AHC~AC AHCH~AHC AHCH~ACH; do cd ${i}/com/t1 edgembar-amber2dats.py -o ./dats $(ls ./*ti.mdout) cd ../../ cd aq/t1 edgembar-amber2dats.py -o ./dats $(ls ./*ti.mdout) cd ${top} done user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ bash makedats.sh This will create a directory called dats in each of the t1 directories containing efep*.dat and dvdl*.dat. Now you must generate .xml files for each of your transformations that specify the directory structure. This may be aided by using the edgembar-example-writeinput.py script. This can be copied from edgembar/examples/DiscoverEdges and is provided. To tailor the script to your directory structure, modify the “s” and “edges” variables to reflect your path. In this example I have called my target environment “com” and my reference environment “aq” to represent complex and aqueous: .. code-block:: python #!/usr/bin/env python3 from edgembar import DiscoverEdges import os from pathlib import Path # # The output directory (where the edge xml input files are to be written # odir = Path("xml") # # The format string describing the directory structure. # The {edge} {env} {stage} {trial} {traj} {ene} placeholders are used # to extract substrings from the path; only the {edge} {traj} and {ene} # are absolutely required. The {env} placeholder must be either # 'target' or 'reference', or you must supply the directory string # with the target and reference optional arguments. # If the {env} placeholder is missing, then # 'complex' is assumed. # # Full example: # s = r"dats/{trial}/free_energy/{edge}_ambest/{env}/{stage}/efep_{traj}_{ene}.dat" # Minimal example: # s = r"dats/{edge}/efep_{traj}_{ene}.dat" s = r"~/YOUR/WORKING/DIR/FILES/RUN_NoAces/{edge}/{env}/t{trial}/dats/efep_{traj}_{ene}.dat" exclusions=None # exclusions=["trial1"] edges = DiscoverEdges(s, target="com", reference="aq", exclude_trials=exclusions) # In some instances, one may have computed a stage with lambda values # going in reverse order relative to the thermodynamic path that leads # from the reactants to the products. We can reverse the order of the # files to effectively negate the free energy of each state (essentially # treating the lambda 0 state as the lambda 1 state). # #for edge in edges: # for trial in edge.GetAllTrials(): # if trial.stage.name == "STAGE": # trial.reverse() if not odir.is_dir(): os.makedirs(odir) for edge in edges: fname = odir / (edge.name + ".xml") edge.WriteXml( fname ) Here I will not choose to exclude any trials; however, if you performed multiple trials and one failed, you could specify to exclude it from analysis. Now run the script: .. code-block:: bash user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ python edgembar-example-writeinput.py A new directory called xml will be generated containing an xml file for each transformation. For example, ACH~AC.xml looks like this: .. code-block:: bash user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ cat xml/ACH~AC.xml /home/erika/dat2/MTR_pKa_tutorial/Simplified_workflow/RUN_NoAces/ACH~AC/com/t1/dats 0.00000000 0.22976400 0.30269700 0.35943600 0.40913000 0.45531800 0.50000000 0.54468200 0.59087000 0.64056400 0.69730300 0.77023600 1.00000000 /home/erika/dat2/MTR_pKa_tutorial/Simplified_workflow/RUN_NoAces/ACH~AC/aq/t1/dats 0.00000000 0.22976400 0.30269700 0.35943600 0.40913000 0.45531800 0.50000000 0.54468200 0.59087000 0.64056400 0.69730300 0.77023600 1.00000000 Check that the xml files satisfy your directory structure and have the correct lambda schedule. Now, in the xml directory, run edgembar_omp for each xml file. This will generate a python file, that we will run to generate the final html output. After all python files have been generated, we will use them to create a combined graph analysis: .. code-block:: bash user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces $ cd xml user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces/xml $ for i in *.xml; do edgembar_omp ${i}; python ${i%.xml}.py; done user@local:~/YOUR/WORKING/DIR/FILES/RUN_NoAces/xml $ edgembar-WriteGraphHtml.py *py -o graph.html Let's start with ACH~AC.html. Open the html file in a browser to see results and analysis pertaining to this edge transformation. The following sections provide valuable information: .. container:: .. figure:: /_static/files/TutorialsInProgress/pKa/ACH~AC_NoAces.png :alt: Figure 3 ΔΔG gives the free energy difference of protonating the residue in question in the MTR1 complex versus aqueous environment. ΔΔG may be converted to pKa units by dividing by RTln(10) ≈ 1.363 kcal/mol. Finally, it may be added to the reference pKa to get an estimated pKa value in the MTR1 complex. Here, the pKa we calculate for C10 assuming A63 is unprotonated is .. math:: pKa(C10_{MTR1}) \approx 4.2 + \frac{3.169 kcal/mol}{1.363 kcal/mol} = 6.5 We also see that 3 of the lambda windows have not reached convergence. This may indicate that simulations should be run longer for more accurate results. Now let's look at the full graph. .. container:: .. figure:: /_static/files/TutorialsInProgress/pKa/Graph_noAces.png :alt: Figure 4 The ΔΔG of each transformation as computed using the MBAR method is given in the UFE column. The AvgCC column shows that the cycle is not perfectly closed, indicating some error that caused the endstates to not be entirely identical after being simulated as part of two different edges. One is strongly encouraged to visualize each of the simulated end state structures when interpreting the free energy results. 4. Applying the ACES method ~~~~~~~~~~~~~~~~~~~~~~~~~~~ If we look at representative structures of the ACH+ and AC endstates in the ACH+~AC transformation shown in Figure 5, we see that the ligand (O6mG) OCH3 group is rotable in response to changes in protonation. To enhance the sampling through the barrier of this rotation, we will apply the ACES method to the OCH3 group. .. container:: .. figure:: /_static/files/TutorialsInProgress/pKa/ACH~AC_structure.png :alt: Figure 5 **Figure 5.** Representative structures of the ACH+ and AC endstates in the ACH+🠂AC transformation. We will repeat the simulation set-up and running procedures from the previous section with some key modifications. Navigate to the RUN_Aces directory. We will again use the ACH+~AC leg as an example. .. code-block:: bash user@local:~/YOUR/WORKING/DIR/FILES/RUN_Aces $ ls ACH~AC AHCH~ACH AHCH~AHC AHC~AC linkref.sh makedats.sh edgembar-example-writeinput.py user@local:~/YOUR/WORKING/DIR/FILES/RUN_Aces $ cd ACH~AC user@local:~/YOUR/WORKING/DIR/FILES/RUN_Aces/ACH~AC $ ls TEMPLATE.sh aq common input merge.sh First you will need to modify the TI merge command to include the ligand in both TI regions. The timerge command in merge.sh is changed to: .. code-block:: bash timerge ":${res1}-${res2}" ":${res3}-${res4}" ":${resnum},${lignum}" ":${copynum},${copylig}" ":${lignum}" which will run the Parmed command .. code-block:: bash timerge ":1-69” “:70-138” “:9,69” “:78,138” “:69” . This will create two copies of the ligand, but we will not transform any of its atoms. Run the merge.sh script. .. code-block:: bash user@local:~/YOUR/WORKING/DIR/FILES/RUN_Aces/ACH~AC $ bash merge.sh The TEMPLATE.sh script must be modified to put the ligand in the TI regions and the OCH3 group in the softcore regions: .. code-block:: bash TI1='@260-291,2220-2238' TI2='@2189-2219,2239-2257' SC1='@291,2220,2231,2233-2235' SC2='@2239,2250,2252-2254' gtiaddsc=25 Setting gti_add_sc=25 in the mdin files enables ACES. Run the TEMPLATE.sh script. .. code-block:: bash user@local:~/YOUR/WORKING/DIR/FILES/RUN_Aces/ACH~AC $ bash TEMPLATE.sh Transfer the files to your HPC, run the jobs, and download the results as you did in section 2. Obtain the results of the other legs and repeat the graph analysis as shown in section 3. The final graph.html should look like this: .. container:: .. figure:: /_static/files/TutorialsInProgress/pKa/Graph_ACES.png :alt: Figure 6 5. Comparing results with and without ACES ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's look at the free energies and associated pKa values with (Aces:H,OCH3) and without (ACES:H) the ACES method applied to the ligand OCH3 group. The data from the graph.html reports are compiled in Table 1. .. .. table:: Comparison of pKa values obtain with and without applying ACES to OCH3 .. +--------------+----------------+--------+-----------+------------+ .. | Transformation | UFE (kcal/mol) | $\Delta\text{pKa}$ | pKa (ref) | pKa (MTR1) | .. +==============+================+========+===========+============+ .. | \multicolumn{5}{|c|}{ACES:H} | .. +--------------+----------------+--------+-----------+------------+ .. | ACH~AC | 3.169 | 2.325 | 4.2 | 6.5 | .. +--------------+----------------+--------+-----------+------------+ .. | AHCH~ACH | 2.515 | 1.845 | 3.5 | 5.3 | .. +--------------+----------------+--------+-----------+------------+ .. | AHCH~AHC | -0.109 | -0.080 | 4.2 | 4.1 | .. +--------------+----------------+--------+-----------+------------+ .. | AHC~AC | 5.178 | 3.799 | 3.5 | 7.3 | .. +--------------+----------------+--------+-----------+------------+ .. | \multicolumn{5}{|c|}{Aces:H,$\text{OCH}_3$} | .. +--------------+----------------+--------+-----------+------------+ .. | ACH~AC | 3.054 | 2.241 | 4.2 | 6.4 | .. +--------------+----------------+--------+-----------+------------+ .. | AHCH~ACH | -0.793 | -0.582 | 3.5 | 2.9 | .. +--------------+----------------+--------+-----------+------------+ .. | AHCH~AHC | -2.276 | -1.670 | 4.2 | 2.5 | .. +--------------+----------------+--------+-----------+------------+ .. | AHC~AC | 5.668 | 4.158 | 3.5 | 7.7 | .. +--------------+----------------+--------+-----------+------------+ .. table:: Table 1. Comparison of pKa values obtained with and without applying ACES to OCH3 +----------------+----------------+----------------+-----------+------------+ | Transformation | UFE (kcal/mol) | ΔpKa | pKa (ref) | pKa (MTR1) | +================+================+================+===========+============+ | **ACES:H** | +----------------+----------------+----------------+-----------+------------+ | ACH~AC | 3.169 | 2.325 | 4.2 | 6.5 | +----------------+----------------+----------------+-----------+------------+ | AHCH~ACH | 2.515 | 1.845 | 3.5 | 5.3 | +----------------+----------------+----------------+-----------+------------+ | AHCH~AHC | -0.109 | -0.080 | 4.2 | 4.1 | +----------------+----------------+----------------+-----------+------------+ | AHC~AC | 5.178 | 3.799 | 3.5 | 7.3 | +----------------+----------------+----------------+-----------+------------+ | **ACES:H, OCH3** | +----------------+----------------+----------------+-----------+------------+ | ACH~AC | 3.054 | 2.241 | 4.2 | 6.4 | +----------------+----------------+----------------+-----------+------------+ | AHCH~ACH | -0.793 | -0.582 | 3.5 | 2.9 | +----------------+----------------+----------------+-----------+------------+ | AHCH~AHC | -2.276 | -1.670 | 4.2 | 2.5 | +----------------+----------------+----------------+-----------+------------+ | AHC~AC | 5.668 | 4.158 | 3.5 | 7.7 | +----------------+----------------+----------------+-----------+------------+ We see that the AHCH~ACH and AHCH~AHC transformations are more favorable when applying ACES to the ligand compared to our first trial. Looking at the trajectories of the AHCH state with and without ACES applied to the ligand OCH3 group shows that ACES allowed us to sample a wider distribution of OCH3 angles, whereas our first trial saw A63 pushed away from the ligand without the ability to recover and explore the same conformational degrees of freedom. .. container:: .. figure:: /_static/files/TutorialsInProgress/pKa/OCH3_rot_ACES.png :alt: Figure 7 **Figure 7.** Representative structure of the AHCH endstate (a) without applying ACES to the ligand OCH3 group and (b) after applying ACES to the ligand OCH3 group. There are two ways we could interpret our results. First, we could assume that protonation events occur independently and have no free energy contribution from coupling to another protonation event. This would suggest the following reaction scheme: .. container:: .. figure:: /_static/files/TutorialsInProgress/pKa/Deprot_uncoupled.png :alt: Figure 8 Using the ACES treatment of the ligand OCH3 results, this would yield pKa values of 2.9 and 6.4 for A63 and C10 respectively by converting the free energies of transformation to pKa units and adding this perturbation to the reference pKa. Second, we could consider coupled protonation events, suggesting that any of the four protonation states in our cycle may occur with an associated probability. This may be described as a four state partition function (Q) based on the grand canonical ensemble in which the number of hydrogens (Ni) is allowed to fluctuate between 0, 1, or 2. The observed rate (kobs) depends on Q, the intrinsic rate (kint), and the fraction of active species (factive). In this case, the active state that catalyzes a reaction is ACH, and the intrinsic rate will be taken from experiment. .. math:: k_{obs} = \frac{k_{int}\cdot f_{active}(pH)}{Q(pH)} .. math:: Q(pH) = \sum_{i=1}^{4}f_{i}(pH) = f_{AHCH} + f_{AHC} + f_{ACH} + f_{AC} The fraction of each species depends on the free energy associated with each node of the thermodynamic cycle (Gi) and the chemical potential (μ(pH)). .. math:: f_{i} = e^{{-\frac{\beta}{N_{a}}(G_{i}-\mu(pH)\cdot N_{i})}} .. math:: \mu(pH) = \frac{-pH\cdot N_{a}\cdot \ln(10)}{\beta} Experimentally, the formulation from which pKa values and intrinsic rate are fit is .. math:: k_{obs} = k_{int} / (1+ 10^{pKa1-pH} + 10^{pH-pKa2} + 10^{pKa2-pKa1}) where pKa1 is the lower pKa and pKa2 is the higher pKa. The four state model may be fit to the experimental model by optimizing kint, pKa1, and pKa2 are parameters. The following is what one would obtain for a fit using our calculated free energies in a 4 state model (black) compared to experiment (red). .. container:: .. figure:: /_static/files/TutorialsInProgress/pKa/MTR_pH_tutorial.png :alt: Figure 9 **Figure 9.** Experimental activity-pH profile of MTR1 (red) and calculated activity-pH model using the 4 model (black). pH values measured experimentally are indicated with points. The pKa values that we fit are not in agreeance with the experimental data, suggesting that MTR1 does not follow the model we have proposed here with coupled protonation events between C10 and A63. Going back to our first model, the C10 pKa of 6.4 that we predict is close to the experimental value of 6.2, and experimental evidence suggests that this is a reasonable assignment. However, the A63 pka of 2.9 that we predict is quite low compared to 5.0. Therefore, we may hypothesize that the experimental pKa of 6.2 originates from C10, and the pKa of 5.0 originates from an event that is independent of C10 protonation, or perhaps a single pKa model would also fit the data. .. end-tutorial