Analyzing Free Energy Calculations with FE-Toolkit

Zeke A. Piskulich1, Timothy Giese1, Patricio Barletta1, Ryan Snyder1, and Darrin M. York1
1Laboratory for Biomolecular Simulation Research, Institute
for Quantitative Biomedicine and Department of Chemistry and Chemical
Biology, Rutgers University, Piscataway, NJ 08854, USA

Learning objectives

  • Learn to use FE-Toolkit to analyze the results of relative and absolute binding free energy calculations.

Activities

In this Activity, you will learn how to analyze the results of free energy calculations using FE-Toolkit. Giese et al.[1] This tutorial will guide you through the process of extracting data from simulation output files, calculating relative and absolute binding free energy from replica exchange simulations, and interpreting the results.

Getting Started with FE-Toolkit

For this tutorial, we have provided the simulation output data from Absolute Binding Free Energy Calculations on Tyk2 for four ligands. In this tutorial, we will primarily use the files for the ligand ejm31; however, we have provided the other ligands as a reference (and as an opportunity for further practice).

To obtain the files, download this ABFE Simulation Output. This file is quite large - so it may take a few minutes to download.

Hint

The FE-Toolkit paper published by Giese and co-workers includes a significant Supporting Information that outlines much of what is included in this tutorial, as well as the underlying math driving these calculations. See the paper Giese et al.[1] for more information.

Simulation Outputs from Alchemical Free Energy Simulations with Amber

The fundamental outputs from an Absolute Binding Free Energy (ABFE) calculation that we will be using are the following:

  • mdouts: These are Amber simulation output files that include energy information from each lambda window.

  • rem.log: These are Amber simulation output files that include information about replica exchange acceptances.

  • rest.in (ABFE Only): This file contains information about the Boresch restraints used in the Amber simulation.

Briefly - we will explore each of these files:

Amber AFE calculations write MDouts with entries that look something like this. They include information about the energies, the dV/dL value, which exchange you are on, as well as information about the soft-core potentials.

| TI region  1


NSTEP =   840250   TIME(PS) =    3361.000  TEMP(K) =   294.22  PRESS =     0.0
Etot   =    -33055.3946  EKtot   =      6151.6533  EPtot      =    -39207.0479
BOND   =         0.0000  ANGLE   =         0.0000  DIHED      =         3.6590
1-4 NB =        -0.0696  1-4 EEL =       -25.0360  VDWAALS    =      6648.6562
EELEC  =    -45834.2575  EHBOND  =         0.0000  RESTRAINT  =         0.0000
DV/DL  =        79.6779
TEMP0  =       298.0000  REPNUM  =              9  EXCHANGE#  =           8403
------------------------------------------------------------------------------

Softcore part of the system:      32 atoms,       TEMP(K)    =         269.57
SC_Etot=        41.8564  SC_EKtot=        22.7663  SC_EPtot   =        19.0901
SC_BOND=         6.4851  SC_ANGLE=        11.9679  SC_DIHED   =         6.5738
SC_14NB=         0.0000  SC_14EEL=         0.0000  SC_VDW     =        -5.9367
SC_EEL =         0.0000
SC_RES_DIST=     0.0000  SC_RES_ANG=       0.0000  SC_RES_TORS=         0.0000
SC_EEL_DER=      1.1039  SC_VDW_DER=     -15.3796  SC_DERIV   =       -14.2757
------------------------------------------------------------------------------


| TI region  2


NSTEP =   840250   TIME(PS) =    3361.000  TEMP(K) =   294.32  PRESS =     0.0
Etot   =    -33078.1609  EKtot   =      6128.8870  EPtot      =    -39207.0479
BOND   =         0.0000  ANGLE   =         0.0000  DIHED      =         3.6590
1-4 NB =        -0.0696  1-4 EEL =       -25.0360  VDWAALS    =      6648.6562
EELEC  =    -45834.2575  EHBOND  =         0.0000  RESTRAINT  =         0.0000
DV/DL  =        79.6779
TEMP0  =       298.0000  REPNUM  =              9  EXCHANGE#  =           8403
------------------------------------------------------------------------------

Amber also writes rem.log files which list the acceptances in exchanges between windows.

# Replica Exchange log file
# numexchg is      12500
# REMD filenames:
#   remlog= remd_complex_ejm31.log
#   remtype= rem.type
# Rep#, Neibr#, Temp0, PotE(x_1), PotE(x_2), left_fe, right_fe, Success, Success rate (i,i+1)
...
# exchange     9032
    1     2    298.00 -39400.10 -39409.58 -Infinity     -1.79    T        0.87
    2     1    298.00 -39408.05 -39398.45      1.79    -10.40    T        0.35
    3     4    298.00 -39416.77 -39352.95     10.40    -22.46    F        0.06
    4     3    298.00 -39332.22 -39390.84     22.45    -33.45    F        0.01
    5     6    298.00 -39316.56 -39355.34     32.76    -36.65    F        0.00
    6     5    298.00 -39321.48 -39275.80     36.74    -34.91    F        0.00
    7     8    298.00 -39129.67 -39121.37     32.64    -25.37    F        0.00
    8     7    298.00 -39108.33 -39103.97     23.32    -13.02    F        0.00
    9    10    298.00 -39087.24 -39262.19     10.73     -3.20    F        0.07
    10     9    298.00 -39262.99 -39081.81      3.20     -0.44    F        0.74
    11    -1    298.00 -39197.79      0.00      0.03      0.00    F        0.00
# exchange     9033
    1    -1    298.00 -39438.52      0.00 -Infinity     -1.79    F        0.87
    2     3    298.00 -39467.30 -39324.55      1.79    -10.40    T        0.35
    3     2    298.00 -39314.20 -39456.10     10.40    -22.46    T        0.06
    4     5    298.00 -39259.84 -39308.14     22.45    -33.45    F        0.01
    5     4    298.00 -39282.02 -39223.52     32.76    -36.65    F        0.00
    6     7    298.00 -39248.23 -39241.31     36.74    -34.91    F        0.00
    7     6    298.00 -39220.57 -39210.90     32.64    -25.37    F        0.00
    8     9    298.00 -39378.26 -39145.48     23.32    -13.02    F        0.00
    9     8    298.00 -39137.02 -39358.97     10.73     -3.20    F        0.07
    10    11    298.00 -39218.70 -39185.63      3.20     -0.44    T        0.74
    11    10    298.00 -39185.17 -39218.08      0.03      0.00    T        0.00

Lastly, in ABFE we use rest.in files to define Boresch restraints (see the Boresch Restraint Tutorial) which hold the ligand in the protein while its interactions are decoupled from the rest of the system at \(\lambda=1\). These restraint files contain the 0-indexed atoms for a pairwise restraint, two angle restraints, and three dihedral restraints between the protein and the ligand.

&rst iat=1489,15
  r1=0.00000,r2=4.44575,r3=4.44575,r4=999.000,rk2=15.06, rk3=15.06/
&rst iat=1477,1489,15
  r1=-180.00000,r2=83.99734,r3=83.99734,r4=180.000,rk2=53.21, rk3=53.21/
&rst iat=1489,15,14
  r1=-180.00000,r2=66.38281,r3=66.38281,r4=180.000,rk2=40.55, rk3=40.55/
&rst iat=1490,1477,1489,15
  r1=-180.00000,r2=11.03062,r3=11.03062,r4=180.000,rk2=29.83, rk3=29.83/
&rst iat=1477,1489,15,14
  r1=-180.00000,r2=30.50819,r3=30.50819,r4=180.000,rk2=62.83, rk3=62.83/
&rst iat=1489,15,14,16
  r1=-180.00000,r2=-137.04995,r3=-137.04995,r4=180.000,rk2=49.20, rk3=49.20/

Note

Boresch restraints do work on the system during an ABFE calculation, and thus have to be included when analyzing them. These will be handled automatically by edgembar during this tutorial, but you can learn more on their impact on the free energy from (see the Boresch Contributions to the Free Energy Tutorial).

Take a look at the folders you downloaded above, and take a minute to look through these files. For each ligand, you have a rem.log file, many mdouts (1/\(\lambda\)-window), and one rest.in file. You would also normally have a trajectory output for at least the end-states; however, that is out of the scope for the present tutorial.

Generating Edgembar HTML Reports

For this tutorial, you need to have edgembar installed. This can either be installed from source (https://gitlab.com/RutgersLBSR/fe-toolkit); however, there is also a Python distribution of edgembar (the component of FE-Toolkit we will be using) available on PyPI (pip install edgembar). More edgembar documentation is located here: https://rutgerslbsr.gitlab.io/fe-toolkit/edgembar/index.html

By default, the Edgembar program reads files that are program agnostic (dat files of the format column1: simulation time, column2: potential energy in kcal/mol). It generally expects these files to be split into the following substructure:

edge/
    env/
        stage/
            trial/

Note

  • edge: Represents the overall ABFE/RBFE calculation. The name comes from thinking of the calculation as an edge in an alchemical network.

  • env: This is the environment (e.g. aqueous, complex)

  • stage: This is what stage for that environment (for instance, if you have separate charge decoupling and vdw decoupling stages). With SmoothStep potentials, often there is just a single vdw stage.

  • trial: A subdirectory for each independent trial, usually named t1 t2 etc.

In a way, these are just labels. The important thing is that there is a single edge, two env values, and there can be arbitrarily many stage and trial values. Each of them represents a calculation, and a very raw sketch of their relationship would look like this:

\[\textbf{edge} = \sum_{\textbf{stages in target env}} \langle \textbf{trials in each stage} \rangle - \sum_{\textbf{stages in reference env}} \langle \textbf{trials in each stage} \rangle\]

In the case of using Amber as the simulation package, FE-Toolkit has built-in tools for generating this hierarchical structure, specifically edgembar-amber2dats.

For this - we will use a short bash script.

edge=$1
mkdir -p dats

for env in {complex,binder};
do
    for stage in vdw;
    do
        for trial in {t1,t2};
        do
            echo "Extracting files for $edge/$env/$stage/$trial"
            mkdir -p dats/$edge/$env/$stage/$trial
            if [ "$env" == "complex" ];
            then
                edgembar-amber2dats.py -r $edge/$trial\_$env/remd_$edge.log --odir dats/$edge/$env/$stage/$trial/ --vba $edge/$trial\_$env/rest.in $(ls $edge/$trial\_$env/*mdout) &
            else
                edgembar-amber2dats.py -r $edge/$trial\_$env/remd_$edge.log --odir dats/$edge/$env/$stage/$trial/ $(ls $edge/$trial\_$env/*mdout) &
            fi
        done
    done
done

Let’s explore the edgembar-amber2dats.py script quickly before running the script.

edgembar-amber2dats.py -h

There are a few important options that you should consider:

  • nan: What should MBAR do if an MBAR energy is ‘**’. The default is to revert to BAR analysis. Important for ABFE.

  • extra: Can add extra data to the edgembar output for decomposing DV/DL profiles. Its main use is plotting the contributions from electrostatics, 1-4 vdw interactions, etc.

For now, we will run without these.

Run the script for ejm31.

bash run_amber2dats.sh ejm31

Now - you’ll see lots of messages saying something along the lines of “Invalid energies within ejm31/t2_complex/complex_ejm31_1.00000.mdout; writing output for BAR. See the –nan option”. This is expected for ABFE, because for MBAR you evaluate the energy at every lambda point with respect to every other lambda point. For ABFE, this means that you often can be evaluating energies of clashes between the protein and the ligand (for instance, in a state where your ligand is fully decoupled.)

Now that these files are generated, take a moment to explore the dats folder. Note that it follows the rough scheme we described above of edge/env/stage/trial. Take a look at the files that you have there.

Now, we will use a small python script to create an xml for edgembar to work on.

#!/usr/bin/env python3
import edgembar
import os
from pathlib import Path

odir = Path("analysis")
s=r"dats/{edge}/{env}/{stage}/{trial}/efep_{traj}_{ene}.dat"

exclusions=None
# Load the dat files into an edges object
edges = edgembar.DiscoverEdges(s,
                                exclude_trials=exclusions,
                                target="complex",
                                reference="binder")

odir.mkdir(exist_ok=True)
for edge in edges:
    # Tell the edges about the shift from the boresch restraints.
    edge.SetVBAShift()
    # Reverse the order of lambdas (1->0) so that the sign is negative, which is what you would expect for an ABFE.
    for trial in edge.GetAllTrials():
        trial.reverse()


    fname = odir / (edge.name + ".xml")
    edge.WriteXml(fname)

A few things are happening in this file. The first is we are setting a template string to say how the dats folder is structured. Then we generate an edges object with DiscoverEdges. If you have outliers, you could potentially want to exclude those trials using the exclusions list (e.g. exclusions=[“t1”])

Now - run this script.

python DiscoverEdges.py

You’ll see a new analysis directory with xml files present for each of the transformations you ran amber2dats for previously (up to four!). These .xml files are the input to edgembar. Open one of them and you’ll see they have the same hierarchical structure of edges, environments, stages, and trials, along with the lambda values per trial (yes, you can use different schedules in every trial of every stage and environment), the path to the .dat files, and an optional constant shift in the free energy, if you used boresch restraints for that trial.

Next, run the actual BAR/MBAR calculation:

OMP_NUM_THREADS=8 edgembar_omp analysis/edge_ejm31.xml --fwdrev --halves

Warning

edgembar and edgembar_omp are dependent on relative paths defined in the xml file. Thus, you must run it from outside of analysis.

The two options enable additional analysis for edgembar that we will discuss in a later part of this tutorial when we are looking at the HTML reports.

This script writes a Python script, which we can run as:

python analysis/edge_ejm31.py
python analysis/edge_ejm31.py --netcdf

Note that we are running it twice. If you only want the HTML report, the first command is enough; however, if you want a portable data file you can pass to other programs or parse yourself, the NetCDF option provides a self-contained file for you to do so.

Now you should see edge_ejm31.html and edge_ejm31.nc in your analysis directory!

Relevant Literature