Calculating Relative Binding Free Energies of Tyk2 Inhibitors
=============================================================

| Zeke A. Piskulich\ :sup:`1`,  Patricio Barletta\ :sup:`1`, Ryan Snyder\ :sup:`1`, and Darrin M. York\ :sup:`1`
| :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute
|   for Quantitative Biomedicine and Department of Chemistry and Chemical
|   Biology, Rutgers University, Piscataway, NJ 08854, USA

Learning objectives
-------------------

.. start-learning-objectives

- Learn to use FE-Workflow to set up and run relative binding free energy calculations.

.. end-learning-objectives

Relevant literature
-------------------

- Coming Soon!


Tutorial
--------

.. warning::

    It is planned that FE-WORKFLOW will be replaced by AmberFlow in the near future. This tutorial is being written in the meantime to help users get started with RBFE calculations with Amber; however, will be replaced once RBFE functionality is added to AmberFlow.

In this Tutorial, you will learn how to set up and run absolute binding free energy calculations using Amberflow, a workflow engine developed in the York Lab for automating Alchemical Free Energy Calculations with Amber.

.. contents::
   :local:
   :depth: 4

.. start-tutorial


.. mermaid::

  flowchart LR
  
      Mol2In[Ligand Mol2 Charges]
      LibIn[Ligand Lib File]
      FRCIn[Ligand FRCmod File]
      PDBIn[Docked PDB]
      inputIn["input_file"]
  
      setup_fe[bash setup_fe.sh]
  
      built_out_complex[Built Complex System]
      built_out_aquous[Built Aqueous System]
  
      run_trials[Run Trial Slurm Jobs]
  
      mdout_complex[MD Output Complex]
      mdout_aquous[MD Output Aqueous]
      remlog_complex[REMLog Complex]
      remlog_aquous[REMLog Aqueous]
  
      Amber2Dats[Amber2Dats]
  
      XMLOuts[XML Outputs]
  
      DiscoverEdges[DiscoverEdges]
      PythonOuts[Python Outputs]
      HTMLOuts[HTML]
      WriteGraph[WriteGraph]
      GraphOuts[Graph Output HTML]
  
  
      Mol2In --> setup_fe
      LibIn --> setup_fe
      FRCIn --> setup_fe
      PDBIn --> setup_fe
      inputIn --> setup_fe
      setup_fe --> built_out_complex
      setup_fe --> built_out_aquous
      built_out_complex --> run_trials
      built_out_aquous --> run_trials
  
      run_trials --> mdout_complex
      run_trials --> mdout_aquous
      run_trials --> remlog_complex
      run_trials --> remlog_aquous
  
      mdout_complex --> Amber2Dats
      mdout_aquous --> Amber2Dats
      remlog_complex --> Amber2Dats
      remlog_aquous --> Amber2Dats
  
      Amber2Dats --> XMLOuts
      XMLOuts --> DiscoverEdges
      DiscoverEdges --> PythonOuts
      DiscoverEdges --> WriteGraph
      PythonOuts --> HTMLOuts
      HTMLOuts --> GraphOuts
      WriteGraph --> GraphOuts

      click Mol2In "#getting-structures-and-parameters"
      click LibIn "#getting-structures-and-parameters"
      click FRCIn "#getting-structures-and-parameters"
      click PDBIn "#getting-structures-and-parameters"

      click inputIn "#preparing-the-simulation-directory"
      click setup_fe "#running-fe-workflow"

      click built_out_complex "#examining-the-outputs-of-the-build"
      click built_out_aquous "#examining-the-outputs-of-the-build"
      click run_trials "#submitting-jobs-to-the-cluster"
      click mdout_complex "#analyzing-results"
      click mdout_aquous "#analyzing-results"
      click remlog_complex "#analyzing-results"
      click remlog_aquous "#analyzing-results"
      click GraphOuts "#analyzing-results"
      click Amber2Dats "#analyzing-results"
      click XMLOuts "#analyzing-results"
      click DiscoverEdges "#analyzing-results"
      click PythonOuts "#analyzing-results"
      click HTMLOuts "#analyzing-results"

  
Getting Started
~~~~~~~~~~~~~~~

In this tutorial, we will be using the FE-WORKFLOW package from the York Group to set up
and run relative binding free energy calculations (RBFE) of Tyk2 inhibitors.

To install FE-WORKLOW, please download it from our gitlab:

.. toggle:: Installation Instructions

    .. code-block:: bash

        git clone git@gitlab.com:RutgersLBSR/FE-Workflow.git

    To install, you need to install differently based on whether you installed
    fe-toolkit using pip.

    If you installed fe-toolkit using pip, you can install FE-WORKFLOW 

    .. code-block:: bash

        source <amber installation path>/amber.sh
        export PATH=<fe-toolkit installation path>/local/bin:$PATH
        export PYTHONPATH=<fe-toolkit installation path>/local/lib/python3.11/site-packages:${PYTHONPATH}
        export PATH=$PATH:<FE-Workflow path>/bin

    If you installed fe-toolkit from source, you can install FE-WORKFLOW using the following commands:

    .. code-block:: bash

        export BACKUP_PATH=$PATH
        export BACKUP_PYTHONPATH=${PYTHONPATH}
        source <amber installation path>/amber.sh
        export PATH=$BACKUP_PATH:$PATH
        export PYTHONPATH=${BACKUP_PYTHONPATH}:$PYTHONPATH
        export PATH=$PATH:<FE-Workflow path>/bin

    Then, once you have done either of these, you run the following command to install FE-WORKFLOW:

    .. code-block:: bash

        makesetup_fe.sh

    This should create the executable `setup_fe` in your PATH. This is what will be used to set up and run the RBFE calculations.

System Information
~~~~~~~~~~~~~~~~~~

In this tutorial, we will be calculating the relative binding free energies of a series of Tyk2 inhibitors. 

These inhibitors are the same as will be covered in the Absolute Binding Free Energy tutorial for Tyk2, which can be found here: :ref:`/ModularTutorials/Alchemical/ABFE_of_Tyk2/tutorial.rst`.

These are:

  - ejm31
  - ejm42
  - jmc27
  - jmc28
  

Getting Structures and Parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To get started, download the following archive which contains RESP-parameterized ligands and prepared protein-ligand complexes for the Tyk2 inhibitors:

To download the files :download:`Download Tar <https://drive.google.com/file/d/1RiZ1XA7mveU2Mxki5TnvHPmahSGSlTyL/view?usp=share_link>`

Once you have downloaded the archive, extract it to a working directory where you will be running the tutorial.

.. code-block:: bash

    tar -xvf tyk2_feworkflow_tutorial.tar.gz
    cd tyk2_feworkflow_tutorial

In this folder, you should see the following files and directories:

parameters/: Directory containing RESP-parameterized ligands.
docked_complexes: Directory containing prepared protein-ligand complexes.

.. note:: 

    Parameters were generated using the protocol outlined in our Force Field Parameterization tutorial, which can be found here: :ref:`/ModularTutorials/ForceField/ligand_parameterization/ligandparam.rst`.

Preparing the Simulation Directory
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To first set up the simulation directory for FE-WORKFLOW, we will make a working directory and copy over the necessary files.

.. code-block:: bash

    mkdir working_dir
    cd working_dir
    mkdir -p initial/tyk2
    cp ../parameters/* initial/tyk2/
    cp ../docked_complexes/* initial/tyk2/

Next, we will create an input file for FE-WORKFLOW. This file contains all the necessary information for setting up and running the RBFE calculations.

Below, we've provided a sample input file which you can copy and paste into a file named `input` in your working directory.

.. toggle:: Sample Input File
    
    .. literalinclude:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/files/input


Below we will go through some of the key sections of this input file.

- **[NAME OF SYSTEM, INPUT STRUCTURES, and TYPE OF CALCULATION]**: This section specifies details about the folder structure, how you want to run the calculations, and the type of calculation.
    - path_to_input: The path to the input files for the system being studied. This is relative to the working directory where you are running FE-WORKFLOW.
    - system: The name of the system being studied. This controls the naming of the build directory, and the location within path_to_input where it expects structures and parameters to be located.
    - setupmode: Only 0 is currently supported.
    - ticalc: Specifies the type of calculation. For RBFE calculations, this should be set to rbfe.
    - stage: setup. This specifies that we are in the setup stage of the calculation. You can later change this to analysis to run the analysis stage.
    - translist: A list of ligand transformations to be performed. Each transformation is specified as a pair of ligand names separated by a ~. These names should match the names of the ligands in the input files.
  
- **[ATOM MAPPING]**: This section specifies how atoms in the ligands are mapped during the transformations.

    - mapmethod: The method used for atom mapping. All three methods use maximum common substructure methods to identify the common atoms between the two ligands. It is recommended to use MCSS-E2 for best results.
        - 0: MCSS
        - 1: MCSS-E
        - 2: MCSS-E2
    - mapinspect: Whether to proceed with the whole build, or to stop after the atom mapping. Set to 0 to map and build, 1 to only map, and 2 to resume generation after inspecting the mapping.
    - mapnetwork: Ensures that for a particular ligand if it is involved in multiple transformations, the same mapping is used across all transformations.
  
- **[MD BOX BUILDING]**: This section specifies parameters for building the simulation boxes for the MD simulations.
    - boxbuild: Which boxes to build for the simulations. skip means no boxes are built, 0 means only the aqueous box is built, 1 is for ASFE/RSFE simulations, and 2 means both boxes are built. Use 2 for RBFE calculations.
    - boxbufcom: The buffer distance (in Angstroms) between the solute and the edge of the box for the complex simulations.
    - boxbufaq: The buffer distance (in Angstroms) between the solute and the edge of the box for the aqueous simulations.
    - ionconc: The concentration of ions (in mM) to add to the simulation boxes. Suggested value is 0.15 (150 mM).
    - pff: The protein forcefield to use. Recommended is ff19sb (though ff14sb is also supported).
    - lff: The ligand forcefield to use. Recommended is gaff2.
    - watermodel: The water model to use. Recommended is tip4pew, though tip3p, spce, and opc are also supported.
    - mdboxshape: The shape of the MD simulation box. Supported shapes are cubic and oct (truncated octahedron).
- **[GENERAL SETTINGS of TI SIMULATIONS]**: This section specifies general settings for the TI simulations.
    - nlambda: The number of lambda windows to use for TI simulations. Recommended is 12 or higher. 
    - lamschedule: Whether to use a lambda schedule. Should be set to yes.
    - lams: A list of lambda values to use for a simulation. 
    - protocol: unified or split. unified means that both electrostatics and van der Waals interactions are changed simultaneously, while split means that they are changed in separate steps. Recommended is unified.
    - ntrials: The number of trial simulations to run for each lambda window. Recommended is 3 or higher.
    - cutoff: The non-bonded cutoff distance (in Angstroms) to use for the simulations. Recommended is 10.0.
    - repex: Whether to use replica exchange during the simulations. Recommended is true.
    - nstlimti: The number of MD steps to run between attempted replica exchanges. Recommended is 5 to 125. 
    - numexchgti: The number of replica exchange attempts to perform during each simulation. Recommended is such that the total simulation time is at least 5 ns per lambda window.
    - hmr: Whether to use hydrogen mass repartitioning. Recommended is true. This sets the timestep to 0.04 fs. Alternatively, you can set hmr to false and use a timestep of 0.02 fs.
    - notractory: When true, no output trajectories are written to save disk space.
    - scalpha: The softcore alpha parameter to use for the TI simulations. Recommended is 0.5.
    - scbeta: Teh softcore beta parameter to use for the TI simulations. Recommended is 1,.0.
    - gti parameters (multiple): These control the specific behavior of the softcore potential during the TI simulations. Recommended values are provided in the sample input file. For more details, please refer to the Amber manual.
    - twostate: Whether the second state is generated from both directions or not. Recommended is true.
    - bidirection_com: Recommended is false
    - bidirection_aq: Recommended is false
    - equil_type: Whether to do separate equilibration for each trial(=2), or combined equilibration(=1). Recommended is 2.
    - ntwx: The frequency (in MD steps) to write output coordinates. Recommended is 0 to save disk space.
    - ntwx_ep: The frequency (in MD steps) to write output coordinates for endpoint windows. Recommended is 1250.
    - ntwr: The frequency (in MD steps) to write restart files. Recommended is 1250.
    - ntpr: The frequency (in MD steps) to write output energies. Recommended is 125.
- **[SETTINGS RELATED TO JOB SUBMISSION]**: This section specifies settings related to job submission on a cluster.
    - partition: The partition/queue to submit jobs to.
    - nnodes: The number of nodes to request for each job.
    - ngpus: The number of GPUs to request for each job.
    - wallclock: The wallclock time to request for each job.
    - source_header: This is an absolute path to a bash script that will be sourced at the beginning of each job. This is useful for loading modules or setting up the environment.
- **[ANALYSIS SETTINGS]**: This section specifies settings related to the analysis of the TI simulations. These primarily control edgembar and amber2dats behavior, and only apply when stage=analysis.
    - path_to_data: The path to the directory containing the simulation data. This is relative to the working directory.
    - expdatafile: The name of the file containing experimental binding free energy data for comparison. If skip, no comparison will be made.
    - bar: Whether to use the BAR method for analysis. Recommended is true.
    - ccc: Whether to use the cycle closure correction during analysis. Recommended is true.
    - ccc_ddG: Recommended true.
    - start: The starting percentage of data to use for analysis. Recommended is 0.
    - stop: The ending percentage of data to use for analysis. Recommended is 100.
    - check_convergence: Whether to check for convergence during analysis. Recommended is true.
    - showallcycles: Whether to show all cycles during analysis. Recommended is true.


Running FE-WORKFLOW
~~~~~~~~~~~~~~~~~~~

To run FE-WORKFLOW, having set up the simulation directories as described above. Simply run the following command from within your working directory:

.. code-block:: bash

    setup_fe

This will setup the simulation directories for all the transformations specified in the input file.

The folder structure created will be as follows (assuming the system name is tyk2):

.. mermaid::

  flowchart TD
  
          working["working_directory/"]
          input["input"]
          initial["initial/tyk2/"]
          tyk2["tyk2/"]
          setup["setup"]
          unified["unified/run/"]
          runall["runalltrials.sh"]
          T1["ejm44~ejm55/"]
          T2["jmc23~ejm44/"]
          T3["jmc23~ejm55/"]
          results["results_tyk2/"]
          img_dir[imgdir]

          C1["com/"]
          A1["aq/"]
          U1["unisc.parm7"]
          SA["StateA.rst7"]
          SB["StateB.rst7"]
          equil1["equilX/"]
          t1["tX/"]
          inputs["inputs/"]
  
  
          working --> input
          working --> initial
          working --> tyk2
          working --> runall
          working --> results
          results --> img_dir
          tyk2 --> setup
          tyk2 --> unified
          unified --> T1
          unified --> T2
          unified --> T3
          T1 --> C1
          T1 --> A1
          T2 --> C1
          T2 --> A1
          T3 --> C1
          T3 --> A1
          C1 --> U1
          A1 --> U1
          C1 --> SA
          C1 --> SB
          A1 --> SA
          A1 --> SB
          C1 --> equil1
          A1 --> equil1
          C1 --> t1
          C1 --> inputs
          A1 --> inputs

          click input "#preparing-the-simulation-directory" 
          click imgdir "#examining-the-outputs-of-the-build"
          

.. note::

    The above diagram is a simplified representation of the folder structure created by FE-WORKFLOW. Each transformation folder (e.g., ejm44~ejm55) will contain separate directories for the complex (com/) and aqueous (aq/) simulations, each with their own input files, equilibration directories, and trial simulation directories. 

    Run mdin files for each lambda window can be found in the inputs/ directory within each simulation folder.


Examining the Outputs of the Build
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After running `setup_fe`, you can examine the outputs of the build process.

The first step you will want to do is examine the atom mappings for each transformation. While you can do this by hand, we will use the results from the imgdir/ folder created in the results_tyk2/ directory.

.. code-block:: bash

    cd results_tyk2/imgdir
    xdg-open index.html  

This will open an HTML file in your web-browser which contains images of the atom mappings for each transformation.

.. warning::

    It is important to carefully examine the atom mappings to ensure that they make chemical sense. Soft core regions are highlighted in red.

    Rules of Thumb:
    - There should only be one softcore region per ligand.
    - Softcore regions should contain the entirety of any ring systems being transformed.
    - Any bonds/angles/dihedrals not involved in softcore regions will not benefit from ACES enhanced sampling, so if you expect significant conformational changes during the transformation, consider including those regions in the softcore region.

First open `index.html <../../../_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/index.html>`_, which will give you an overview of all the transformations. 

.. tab-set::
    
    .. tab-item:: ejm44

        .. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/ejm_44~ejm_55_ejm_44.png
            :alt: ejm44 mapping
            :width: 300px

    .. tab-item:: ejm55

        .. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/ejm_44~ejm_55_ejm_55.png
            :alt: ejm55 mapping
            :width: 300px
    
    .. tab-item:: jmc23

        .. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/jmc_23~ejm_44_jmc_23.png
            :alt: jmc23 mapping
            :width: 300px

Modifying Atom Mappings
~~~~~~~~~~~~~~~~~~~~~~~

Now - this is a case where you may want to modify the atom mapping, for instance if you want to enhance sample the dihedral leading into the ring system. To do that - you need to know the atom indices of the atoms you want to modify.

You can find these by opening the `index_labels.html <../../../_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/index_labels.html>`_ file in the same imgdir/ folder. This file contains the same images as index.html, but with atom indices labeled.

.. tab-set::
    
    .. tab-item:: ejm44

        .. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/ejm_44~ejm_55_ejm_44_labels.png
            :alt: ejm44 mapping with labels
            :width: 300px

    .. tab-item:: ejm55

        .. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/ejm_44~ejm_55_ejm_55_labels.png
            :alt: ejm55 mapping with labels
            :width: 300px
    
    .. tab-item:: jmc23

        .. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/jmc_23~ejm_44_jmc_23_labels.png
            :alt: jmc23 mapping with labels
            :width: 300px

To do this make a directory called `atom_mappings` in our working directory, and we will copy over the mapping files for each transformation.

.. code-block:: bash

    mkdir atom_mappings
    cp tyk2/setup/*map.txt atom_mappings/

This gives you three files - ejm44~ejm55_map.txt, jmc23~ejm44_map.txt, and jmc23~ejm55_map.txt.

Open the mapping file for the transformations you want to modify in a text editor. The mapping file maps atoms in the `common_core`
region between the two ligands. Thus, to increase the softcore region - we need to remove the atoms we want to include in the softcore region from the mapping file. We can do this using the labeled images from above.

For example, from ejm_44~ejm_55.map.txt, we can remove the following lines to include the dihedral leading into the ring system in the softcore region:

.. code-block:: text

    O2 =>   O2
    H8 =>   H8
    N3 =>   N3
    C13 =>  C13

Note - while in this case the atom names were the same, this is not always true. 

Remove these lines from the remaining mapping files, and save the modified mapping files.

.. warning:: 

    Be very careful when modifying the mapping files. Ensure that the modified mappings still make chemical sense, and that no bonds are broken in the common core region. Also - ensure that the same atoms are not mapped to different atoms in different transformations involving the same ligand.


Now - to use the new mappings, we need to point the input file to where it can find the new mappings, and then re-run the setup process. 

To do this, add the following line to the [ATOM MAPPING] section of your input file:

.. code-block:: text

    read_map=coremap

Then, re-run the setup process:

.. code-block:: bash

    rm -rf tyk2
    rm -rf results_tyk2
    setup_fe

This will re-generate the simulation directories using the modified atom mappings. You can re-examine the mappings in the imgdir/ folder to ensure that the modifications were applied correctly.


Submitting Jobs to the Cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To submit the simulations to your cluster, run the following command:

.. code-block:: bash

    ./runalltrials.sh

This is a smart submission script that will try to submit all jobs, and if it fails (e.g.) hits a job limit, it will keep trying to submit until all jobs are submitted. Trials are submitted as separate jobs to maximize parallelism.

Once all jobs are submitted, you can monitor their progress using your cluster's job monitoring tools (e.g., `squeue`, `qstat`, etc.).


Analyzing Results
~~~~~~~~~~~~~~~~~

Once all simulations are complete, you can analyze the results by changing the `stage` parameter in your input file from `setup` to `analysis`, and re-running `setup_fe`:

.. code-block:: bash

    # Edit input file to change stage from setup to analysis
    setup_fe

This will perform the analysis of the TI simualtions and generate output files in the results_tyk2/ directory.

You can view the analysis results by opening the `Graph.html` file in the `results_tyk2/analysis` directory.

.. code-block:: bash

    xdg-open results_tyk2/analysis/Graph.html # or open on MacOS

We have hosted the file here for your convenience: `Graph.html <../../../_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/files/analysis/Graph.html>`_


When you first load the html file, you will see at the top of the file a map of the RBFE network (in this case three ligands) with a variety of controls.

You can use these controls to customize the display of the results, including the coloring of nodes, and the coloring and thickness of edges based on various metrics.

.. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/GraphEntrance.png
    :alt: Graph Output
    :width: 600px

Scrolling down, you will see a control panel that allows you to display a variety of views. The default is a help pane which describes various important abbreviations.

.. tab-set::

    .. tab-item:: Help

        This section describes various important abbreviations used in the graph display.

        .. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/help.png
            :alt: Graph Help
            :width: 600px

    .. tab-item:: Node

        This section describes the free energy of individual nodes (ligands) in the network (relative to an arbitrary reference ligand).

        .. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/nodes.png
            :alt: Graph Node Data
            :width: 600px

    .. tab-item:: Edge

        This section describes the free energy differences between pairs of ligands (edges) in the network.

        .. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/edges.png
            :alt: Graph Edge Data
            :width: 600px

    .. tab-item:: Cycle

        This section describes the cycle closure errors for cycles in the network.

        .. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/cycles.png
            :alt: Graph Cycle Data
            :width: 600px

    .. tab-item:: TI

        This compares different TI analysis approaches and how well they agree with BAR.

        .. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/ti.png
            :alt: Graph TI Data
            :width: 600px
        
    .. tab-item:: RepEx

        This section describes the replica exchange statistics for each edge in the network.

        .. image:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/imgs/repex.png
            :alt: Graph RepEx Data
            :width: 600px
    
.. end-tutorial

Relevant Literature
-------------------