Analyzing Free Energy Calculations with FE-Toolkit
==================================================

| Zeke A. Piskulich\ :sup:`1`, Timothy Giese\ :sup:`1`,  Patricio Barletta\ :sup:`1`, Ryan Snyder\ :sup:`1`, and Darrin M. York\ :sup:`1`
| :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute
|   for Quantitative Biomedicine and Department of Chemistry and Chemical
|   Biology, Rutgers University, Piscataway, NJ 08854, USA

Learning objectives
-------------------

.. start-learning-objectives

- Learn to use FE-Toolkit to analyze the results of relative and absolute binding free energy calculations.

.. end-learning-objectives


Activities
----------
In this Activity, you will learn how to analyze the results of free energy calculations using FE-Toolkit. :footcite:t:`Giese_JChemInfModel_2025_v65_p5273` This tutorial will guide you through the process of extracting data from simulation output files, calculating relative and absolute binding free energy from replica exchange simulations, and interpreting the results. 


.. start-tutorial

Getting Started with FE-Toolkit
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For this tutorial, we have provided the simulation output data from Absolute Binding Free Energy Calculations on Tyk2 for four ligands. In this tutorial, we will primarily use the files for the ligand ejm31; however, we have provided the other ligands as a reference (and as an opportunity for further practice).

To obtain the files, download this :download:`ABFE Simulation Output <https://drive.google.com/file/d/1KSIXdkMpc382LnZr2EPG3RNzaFZwr9yJ/view?usp=share_link>`. This file is quite large - so it may take a few minutes to download.

.. hint::

    The FE-Toolkit paper published by Giese and co-workers includes a significant Supporting Information that outlines much of what is included in this tutorial, as well as the underlying math driving these calculations. See the paper :footcite:t:`Giese_JChemInfModel_2025_v65_p5273` for more information.

Simulation Outputs from Alchemical Free Energy Simulations with Amber
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The fundamental outputs from an Absolute Binding Free Energy (ABFE) calculation that we will be using are the following:

- **mdouts**: These are Amber simulation output files that include energy information from each lambda window.
- **rem.log**: These are Amber simulation output files that include information about replica exchange acceptances.
- **rest.in**  (ABFE Only): This file contains information about the Boresch restraints used in the Amber simulation.

Briefly - we will explore each of these files:

Amber AFE calculations write :term:`MDouts <.mdout>` with entries that look something like this. They include information about the energies, the dV/dL value, which exchange you are on, as well as information about the soft-core potentials.

.. code-block:: text

    | TI region  1


    NSTEP =   840250   TIME(PS) =    3361.000  TEMP(K) =   294.22  PRESS =     0.0
    Etot   =    -33055.3946  EKtot   =      6151.6533  EPtot      =    -39207.0479
    BOND   =         0.0000  ANGLE   =         0.0000  DIHED      =         3.6590
    1-4 NB =        -0.0696  1-4 EEL =       -25.0360  VDWAALS    =      6648.6562
    EELEC  =    -45834.2575  EHBOND  =         0.0000  RESTRAINT  =         0.0000
    DV/DL  =        79.6779
    TEMP0  =       298.0000  REPNUM  =              9  EXCHANGE#  =           8403
    ------------------------------------------------------------------------------

    Softcore part of the system:      32 atoms,       TEMP(K)    =         269.57
    SC_Etot=        41.8564  SC_EKtot=        22.7663  SC_EPtot   =        19.0901
    SC_BOND=         6.4851  SC_ANGLE=        11.9679  SC_DIHED   =         6.5738
    SC_14NB=         0.0000  SC_14EEL=         0.0000  SC_VDW     =        -5.9367
    SC_EEL =         0.0000
    SC_RES_DIST=     0.0000  SC_RES_ANG=       0.0000  SC_RES_TORS=         0.0000
    SC_EEL_DER=      1.1039  SC_VDW_DER=     -15.3796  SC_DERIV   =       -14.2757
    ------------------------------------------------------------------------------


    | TI region  2


    NSTEP =   840250   TIME(PS) =    3361.000  TEMP(K) =   294.32  PRESS =     0.0
    Etot   =    -33078.1609  EKtot   =      6128.8870  EPtot      =    -39207.0479
    BOND   =         0.0000  ANGLE   =         0.0000  DIHED      =         3.6590
    1-4 NB =        -0.0696  1-4 EEL =       -25.0360  VDWAALS    =      6648.6562
    EELEC  =    -45834.2575  EHBOND  =         0.0000  RESTRAINT  =         0.0000
    DV/DL  =        79.6779
    TEMP0  =       298.0000  REPNUM  =              9  EXCHANGE#  =           8403
    ------------------------------------------------------------------------------

Amber also writes rem.log files which list the acceptances in exchanges between windows.

.. code-block:: text

    # Replica Exchange log file
    # numexchg is      12500
    # REMD filenames:
    #   remlog= remd_complex_ejm31.log
    #   remtype= rem.type
    # Rep#, Neibr#, Temp0, PotE(x_1), PotE(x_2), left_fe, right_fe, Success, Success rate (i,i+1)
    ...
    # exchange     9032
        1     2    298.00 -39400.10 -39409.58 -Infinity     -1.79    T        0.87
        2     1    298.00 -39408.05 -39398.45      1.79    -10.40    T        0.35
        3     4    298.00 -39416.77 -39352.95     10.40    -22.46    F        0.06
        4     3    298.00 -39332.22 -39390.84     22.45    -33.45    F        0.01
        5     6    298.00 -39316.56 -39355.34     32.76    -36.65    F        0.00
        6     5    298.00 -39321.48 -39275.80     36.74    -34.91    F        0.00
        7     8    298.00 -39129.67 -39121.37     32.64    -25.37    F        0.00
        8     7    298.00 -39108.33 -39103.97     23.32    -13.02    F        0.00
        9    10    298.00 -39087.24 -39262.19     10.73     -3.20    F        0.07
        10     9    298.00 -39262.99 -39081.81      3.20     -0.44    F        0.74
        11    -1    298.00 -39197.79      0.00      0.03      0.00    F        0.00
    # exchange     9033
        1    -1    298.00 -39438.52      0.00 -Infinity     -1.79    F        0.87
        2     3    298.00 -39467.30 -39324.55      1.79    -10.40    T        0.35
        3     2    298.00 -39314.20 -39456.10     10.40    -22.46    T        0.06
        4     5    298.00 -39259.84 -39308.14     22.45    -33.45    F        0.01
        5     4    298.00 -39282.02 -39223.52     32.76    -36.65    F        0.00
        6     7    298.00 -39248.23 -39241.31     36.74    -34.91    F        0.00
        7     6    298.00 -39220.57 -39210.90     32.64    -25.37    F        0.00
        8     9    298.00 -39378.26 -39145.48     23.32    -13.02    F        0.00
        9     8    298.00 -39137.02 -39358.97     10.73     -3.20    F        0.07
        10    11    298.00 -39218.70 -39185.63      3.20     -0.44    T        0.74
        11    10    298.00 -39185.17 -39218.08      0.03      0.00    T        0.00


Lastly, in ABFE we use rest.in files to define Boresch restraints :doc:`(see the Boresch Restraint Tutorial) </ModularTutorials/Alchemical/ABFE_of_Tyk2/boresch_restraints>` which hold the ligand in the protein while its interactions are decoupled from the rest of the system at :math:`\lambda=1`. These restraint files contain the 0-indexed atoms for a pairwise restraint, two angle restraints, and three dihedral restraints between the protein and the ligand.

.. literalinclude:: /_static/files/ModularTutorials/Alchemical/ABFE_of_Tyk2/abfe_tyk2_tutorial/boresch_files/rest.in
   :language: text

.. note::

    Boresch restraints do work on the system during an ABFE calculation, and thus have to be included when analyzing them. These will be handled automatically by edgembar during this tutorial, but you can learn more on their impact on the free energy from :doc:`(see the Boresch Contributions to the Free Energy Tutorial) </ModularTutorials/Alchemical/ABFE_of_Tyk2/boresch_contribution_free_energy>`.

Take a look at the folders you downloaded above, and take a minute to look through these files. For each ligand, you have a rem.log file, many :term:`mdouts <.mdout>` (1/:math:`\lambda`-window), and one rest.in file. You would also normally have a :term:`trajectory <.nc>` output for at least the end-states; however, that is out of the scope for the present tutorial.

Generating Edgembar HTML Reports
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For this tutorial, you need to have edgembar installed. This can either be installed from source (https://gitlab.com/RutgersLBSR/fe-toolkit); however, there is also a Python distribution of edgembar (the component of FE-Toolkit we will be using) available on PyPI (`pip install edgembar`). More edgembar documentation is located here: https://rutgerslbsr.gitlab.io/fe-toolkit/edgembar/index.html


By default, the Edgembar program reads files that are program agnostic (dat files of the format column1: simulation time, column2: potential energy in kcal/mol). It generally expects these files to be split into the following substructure:


.. code-block::

    edge/
        env/
            stage/
                trial/

.. note::

    - **edge**: Represents the overall ABFE/RBFE calculation. The name comes from thinking of the calculation as an edge in an alchemical network.
    - **env**: This is the environment (e.g. aqueous, complex)
    - **stage**: This is what stage for that environment (for instance, if you have separate charge decoupling and vdw decoupling stages). With SmoothStep potentials, often there is just a single vdw stage.
    - **trial**: A subdirectory for each independent trial, usually named t1 t2 etc.

In a way, these are just labels. The important thing is that there is a single **edge**, two **env** values, and there can be arbitrarily many **stage** and **trial** values.
Each of them represents a calculation, and a very raw sketch of their relationship would look like this:

.. math::

   \textbf{edge} = \sum_{\textbf{stages in target env}} \langle \textbf{trials in each stage} \rangle - \sum_{\textbf{stages in reference env}} \langle \textbf{trials in each stage} \rangle


In the case of using Amber as the simulation package, FE-Toolkit has built-in tools for generating this hierarchical structure, specifically `edgembar-amber2dats`.

For this - we will use a short bash script.

.. code-block:: bash

    edge=$1
    mkdir -p dats

    for env in {complex,binder};
    do
        for stage in vdw;
        do
            for trial in {t1,t2};
            do
                echo "Extracting files for $edge/$env/$stage/$trial"
                mkdir -p dats/$edge/$env/$stage/$trial
                if [ "$env" == "complex" ];
                then
                    edgembar-amber2dats.py -r $edge/$trial\_$env/remd_$edge.log --odir dats/$edge/$env/$stage/$trial/ --vba $edge/$trial\_$env/rest.in $(ls $edge/$trial\_$env/*mdout) &
                else
                    edgembar-amber2dats.py -r $edge/$trial\_$env/remd_$edge.log --odir dats/$edge/$env/$stage/$trial/ $(ls $edge/$trial\_$env/*mdout) &
                fi
            done
        done
    done

Let's explore the edgembar-amber2dats.py script quickly before running the script.

.. code-block:: bash

    edgembar-amber2dats.py -h

There are a few important options that you should consider:

- nan: What should MBAR do if an MBAR energy is '******'. The default is to revert to BAR analysis. Important for ABFE.
- extra: Can add extra data to the edgembar output for decomposing DV/DL profiles. Its main use is plotting the contributions from electrostatics, 1-4 vdw interactions, etc.

For now, we will run without these.

Run the script for ejm31.

.. code-block:: bash

    bash run_amber2dats.sh ejm31

Now - you'll see lots of messages saying something along the lines of "Invalid energies within ejm31/t2_complex/complex_ejm31_1.00000.mdout; writing output for BAR. See the --nan option". This is expected for ABFE, because for MBAR you evaluate the energy at every lambda point with respect to every other lambda point. For ABFE, this means that you often can be evaluating energies of clashes between the protein and the ligand (for instance, in a state where your ligand is fully decoupled.)

Now that these files are generated, take a moment to explore the dats folder. Note that it follows the rough scheme we described above of edge/env/stage/trial. Take a look at the files that you have there.

Now, we will use a small python script to create an xml for edgembar to work on.

.. code-block:: python

    #!/usr/bin/env python3
    import edgembar
    import os
    from pathlib import Path

    odir = Path("analysis")
    s=r"dats/{edge}/{env}/{stage}/{trial}/efep_{traj}_{ene}.dat"

    exclusions=None
    # Load the dat files into an edges object
    edges = edgembar.DiscoverEdges(s,
                                    exclude_trials=exclusions,
                                    target="complex",
                                    reference="binder")

    odir.mkdir(exist_ok=True)
    for edge in edges:
        # Tell the edges about the shift from the boresch restraints.
        edge.SetVBAShift()
        # Reverse the order of lambdas (1->0) so that the sign is negative, which is what you would expect for an ABFE.
        for trial in edge.GetAllTrials():
            trial.reverse()


        fname = odir / (edge.name + ".xml")
        edge.WriteXml(fname)

A few things are happening in this file. The first is we are setting a template string to say how the dats folder is structured. Then we generate an edges object with DiscoverEdges. If you have outliers, you could potentially want to exclude those trials using the exclusions list (e.g. exclusions=["t1"])


Now - run this script.

.. code-block:: bash

    python DiscoverEdges.py

You'll see a new analysis directory with xml files present for each of the transformations you ran amber2dats for previously (up to four!).
These **.xml** files are the input to **edgembar**. Open one of them and you'll see they have the same hierarchical structure of edges,
environments, stages, and trials, along with the lambda values per trial (yes, you can use different schedules in every trial of every
stage and environment), the path to the **.dat** files, and an optional constant **shift** in the free energy,
if you used boresch restraints for that trial.

Next, run the actual BAR/MBAR calculation:

.. code-block:: bash

    OMP_NUM_THREADS=8 edgembar_omp analysis/edge_ejm31.xml --fwdrev --halves

.. warning::

    edgembar and edgembar_omp are dependent on relative paths defined in the xml file. Thus, you must run it from outside of analysis.

The two options enable additional analysis for edgembar that we will discuss in a later part of this tutorial when we are looking at the HTML reports.

This script writes a Python script, which we can run as:

.. code-block:: bash

    python analysis/edge_ejm31.py
    python analysis/edge_ejm31.py --netcdf

Note that we are running it twice. If you only want the HTML report, the first command is enough; however, if you want a portable data file you can pass to other programs or parse yourself, the NetCDF option provides a self-contained file for you to do so.

Now you should see **edge_ejm31.html** and **edge_ejm31.nc** in your analysis directory!


.. end-tutorial

Relevant Literature
-------------------

.. footbibliography::