Analyzing Free Energy Calculations with FE-Toolkit ================================================== | Zeke A. Piskulich\ :sup:`1`, Timothy Giese\ :sup:`1`, Patricio Barletta\ :sup:`1`, Ryan Snyder\ :sup:`1`, and Darrin M. York\ :sup:`1` | :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute | for Quantitative Biomedicine and Department of Chemistry and Chemical | Biology, Rutgers University, Piscataway, NJ 08854, USA Learning objectives ------------------- .. start-learning-objectives - Learn to use FE-Toolkit to analyze the results of relative and absolute binding free energy calculations. .. end-learning-objectives Activities ---------- In this Activity, you will learn how to analyze the results of free energy calculations using FE-Toolkit. :footcite:t:`Giese_JChemInfModel_2025_v65_p5273` This tutorial will guide you through the process of extracting data from simulation output files, calculating relative and absolute binding free energy from replica exchange simulations, and interpreting the results. .. start-tutorial Getting Started with FE-Toolkit ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For this tutorial, we have provided the simulation output data from Absolute Binding Free Energy Calculations on Tyk2 for four ligands. In this tutorial, we will primarily use the files for the ligand ejm31; however, we have provided the other ligands as a reference (and as an opportunity for further practice). To obtain the files, download this :download:`ABFE Simulation Output `. This file is quite large - so it may take a few minutes to download. .. hint:: The FE-Toolkit paper published by Giese and co-workers includes a significant Supporting Information that outlines much of what is included in this tutorial, as well as the underlying math driving these calculations. See the paper :footcite:t:`Giese_JChemInfModel_2025_v65_p5273` for more information. Simulation Outputs from Alchemical Free Energy Simulations with Amber ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The fundamental outputs from an Absolute Binding Free Energy (ABFE) calculation that we will be using are the following: - **mdouts**: These are Amber simulation output files that include energy information from each lambda window. - **rem.log**: These are Amber simulation output files that include information about replica exchange acceptances. - **rest.in** (ABFE Only): This file contains information about the Boresch restraints used in the Amber simulation. Briefly - we will explore each of these files: Amber AFE calculations write :term:`MDouts <.mdout>` with entries that look something like this. They include information about the energies, the dV/dL value, which exchange you are on, as well as information about the soft-core potentials. .. code-block:: text | TI region 1 NSTEP = 840250 TIME(PS) = 3361.000 TEMP(K) = 294.22 PRESS = 0.0 Etot = -33055.3946 EKtot = 6151.6533 EPtot = -39207.0479 BOND = 0.0000 ANGLE = 0.0000 DIHED = 3.6590 1-4 NB = -0.0696 1-4 EEL = -25.0360 VDWAALS = 6648.6562 EELEC = -45834.2575 EHBOND = 0.0000 RESTRAINT = 0.0000 DV/DL = 79.6779 TEMP0 = 298.0000 REPNUM = 9 EXCHANGE# = 8403 ------------------------------------------------------------------------------ Softcore part of the system: 32 atoms, TEMP(K) = 269.57 SC_Etot= 41.8564 SC_EKtot= 22.7663 SC_EPtot = 19.0901 SC_BOND= 6.4851 SC_ANGLE= 11.9679 SC_DIHED = 6.5738 SC_14NB= 0.0000 SC_14EEL= 0.0000 SC_VDW = -5.9367 SC_EEL = 0.0000 SC_RES_DIST= 0.0000 SC_RES_ANG= 0.0000 SC_RES_TORS= 0.0000 SC_EEL_DER= 1.1039 SC_VDW_DER= -15.3796 SC_DERIV = -14.2757 ------------------------------------------------------------------------------ | TI region 2 NSTEP = 840250 TIME(PS) = 3361.000 TEMP(K) = 294.32 PRESS = 0.0 Etot = -33078.1609 EKtot = 6128.8870 EPtot = -39207.0479 BOND = 0.0000 ANGLE = 0.0000 DIHED = 3.6590 1-4 NB = -0.0696 1-4 EEL = -25.0360 VDWAALS = 6648.6562 EELEC = -45834.2575 EHBOND = 0.0000 RESTRAINT = 0.0000 DV/DL = 79.6779 TEMP0 = 298.0000 REPNUM = 9 EXCHANGE# = 8403 ------------------------------------------------------------------------------ Amber also writes rem.log files which list the acceptances in exchanges between windows. .. code-block:: text # Replica Exchange log file # numexchg is 12500 # REMD filenames: # remlog= remd_complex_ejm31.log # remtype= rem.type # Rep#, Neibr#, Temp0, PotE(x_1), PotE(x_2), left_fe, right_fe, Success, Success rate (i,i+1) ... # exchange 9032 1 2 298.00 -39400.10 -39409.58 -Infinity -1.79 T 0.87 2 1 298.00 -39408.05 -39398.45 1.79 -10.40 T 0.35 3 4 298.00 -39416.77 -39352.95 10.40 -22.46 F 0.06 4 3 298.00 -39332.22 -39390.84 22.45 -33.45 F 0.01 5 6 298.00 -39316.56 -39355.34 32.76 -36.65 F 0.00 6 5 298.00 -39321.48 -39275.80 36.74 -34.91 F 0.00 7 8 298.00 -39129.67 -39121.37 32.64 -25.37 F 0.00 8 7 298.00 -39108.33 -39103.97 23.32 -13.02 F 0.00 9 10 298.00 -39087.24 -39262.19 10.73 -3.20 F 0.07 10 9 298.00 -39262.99 -39081.81 3.20 -0.44 F 0.74 11 -1 298.00 -39197.79 0.00 0.03 0.00 F 0.00 # exchange 9033 1 -1 298.00 -39438.52 0.00 -Infinity -1.79 F 0.87 2 3 298.00 -39467.30 -39324.55 1.79 -10.40 T 0.35 3 2 298.00 -39314.20 -39456.10 10.40 -22.46 T 0.06 4 5 298.00 -39259.84 -39308.14 22.45 -33.45 F 0.01 5 4 298.00 -39282.02 -39223.52 32.76 -36.65 F 0.00 6 7 298.00 -39248.23 -39241.31 36.74 -34.91 F 0.00 7 6 298.00 -39220.57 -39210.90 32.64 -25.37 F 0.00 8 9 298.00 -39378.26 -39145.48 23.32 -13.02 F 0.00 9 8 298.00 -39137.02 -39358.97 10.73 -3.20 F 0.07 10 11 298.00 -39218.70 -39185.63 3.20 -0.44 T 0.74 11 10 298.00 -39185.17 -39218.08 0.03 0.00 T 0.00 Lastly, in ABFE we use rest.in files to define Boresch restraints :doc:`(see the Boresch Restraint Tutorial) ` which hold the ligand in the protein while its interactions are decoupled from the rest of the system at :math:`\lambda=1`. These restraint files contain the 0-indexed atoms for a pairwise restraint, two angle restraints, and three dihedral restraints between the protein and the ligand. .. literalinclude:: /_static/files/ModularTutorials/Alchemical/ABFE_of_Tyk2/abfe_tyk2_tutorial/boresch_files/rest.in :language: text .. note:: Boresch restraints do work on the system during an ABFE calculation, and thus have to be included when analyzing them. These will be handled automatically by edgembar during this tutorial, but you can learn more on their impact on the free energy from :doc:`(see the Boresch Contributions to the Free Energy Tutorial) `. Take a look at the folders you downloaded above, and take a minute to look through these files. For each ligand, you have a rem.log file, many :term:`mdouts <.mdout>` (1/:math:`\lambda`-window), and one rest.in file. You would also normally have a :term:`trajectory <.nc>` output for at least the end-states; however, that is out of the scope for the present tutorial. Generating Edgembar HTML Reports ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For this tutorial, you need to have edgembar installed. This can either be installed from source (https://gitlab.com/RutgersLBSR/fe-toolkit); however, there is also a Python distribution of edgembar (the component of FE-Toolkit we will be using) available on PyPI (`pip install edgembar`). More edgembar documentation is located here: https://rutgerslbsr.gitlab.io/fe-toolkit/edgembar/index.html By default, the Edgembar program reads files that are program agnostic (dat files of the format column1: simulation time, column2: potential energy in kcal/mol). It generally expects these files to be split into the following substructure: .. code-block:: edge/ env/ stage/ trial/ .. note:: - **edge**: Represents the overall ABFE/RBFE calculation. The name comes from thinking of the calculation as an edge in an alchemical network. - **env**: This is the environment (e.g. aqueous, complex) - **stage**: This is what stage for that environment (for instance, if you have separate charge decoupling and vdw decoupling stages). With SmoothStep potentials, often there is just a single vdw stage. - **trial**: A subdirectory for each independent trial, usually named t1 t2 etc. In a way, these are just labels. The important thing is that there is a single **edge**, two **env** values, and there can be arbitrarily many **stage** and **trial** values. Each of them represents a calculation, and a very raw sketch of their relationship would look like this: .. math:: \textbf{edge} = \sum_{\textbf{stages in target env}} \langle \textbf{trials in each stage} \rangle - \sum_{\textbf{stages in reference env}} \langle \textbf{trials in each stage} \rangle In the case of using Amber as the simulation package, FE-Toolkit has built-in tools for generating this hierarchical structure, specifically `edgembar-amber2dats`. For this - we will use a short bash script. .. code-block:: bash edge=$1 mkdir -p dats for env in {complex,binder}; do for stage in vdw; do for trial in {t1,t2}; do echo "Extracting files for $edge/$env/$stage/$trial" mkdir -p dats/$edge/$env/$stage/$trial if [ "$env" == "complex" ]; then edgembar-amber2dats.py -r $edge/$trial\_$env/remd_$edge.log --odir dats/$edge/$env/$stage/$trial/ --vba $edge/$trial\_$env/rest.in $(ls $edge/$trial\_$env/*mdout) & else edgembar-amber2dats.py -r $edge/$trial\_$env/remd_$edge.log --odir dats/$edge/$env/$stage/$trial/ $(ls $edge/$trial\_$env/*mdout) & fi done done done Let's explore the edgembar-amber2dats.py script quickly before running the script. .. code-block:: bash edgembar-amber2dats.py -h There are a few important options that you should consider: - nan: What should MBAR do if an MBAR energy is '******'. The default is to revert to BAR analysis. Important for ABFE. - extra: Can add extra data to the edgembar output for decomposing DV/DL profiles. Its main use is plotting the contributions from electrostatics, 1-4 vdw interactions, etc. For now, we will run without these. Run the script for ejm31. .. code-block:: bash bash run_amber2dats.sh ejm31 Now - you'll see lots of messages saying something along the lines of "Invalid energies within ejm31/t2_complex/complex_ejm31_1.00000.mdout; writing output for BAR. See the --nan option". This is expected for ABFE, because for MBAR you evaluate the energy at every lambda point with respect to every other lambda point. For ABFE, this means that you often can be evaluating energies of clashes between the protein and the ligand (for instance, in a state where your ligand is fully decoupled.) Now that these files are generated, take a moment to explore the dats folder. Note that it follows the rough scheme we described above of edge/env/stage/trial. Take a look at the files that you have there. Now, we will use a small python script to create an xml for edgembar to work on. .. code-block:: python #!/usr/bin/env python3 import edgembar import os from pathlib import Path odir = Path("analysis") s=r"dats/{edge}/{env}/{stage}/{trial}/efep_{traj}_{ene}.dat" exclusions=None # Load the dat files into an edges object edges = edgembar.DiscoverEdges(s, exclude_trials=exclusions, target="complex", reference="binder") odir.mkdir(exist_ok=True) for edge in edges: # Tell the edges about the shift from the boresch restraints. edge.SetVBAShift() # Reverse the order of lambdas (1->0) so that the sign is negative, which is what you would expect for an ABFE. for trial in edge.GetAllTrials(): trial.reverse() fname = odir / (edge.name + ".xml") edge.WriteXml(fname) A few things are happening in this file. The first is we are setting a template string to say how the dats folder is structured. Then we generate an edges object with DiscoverEdges. If you have outliers, you could potentially want to exclude those trials using the exclusions list (e.g. exclusions=["t1"]) Now - run this script. .. code-block:: bash python DiscoverEdges.py You'll see a new analysis directory with xml files present for each of the transformations you ran amber2dats for previously (up to four!). These **.xml** files are the input to **edgembar**. Open one of them and you'll see they have the same hierarchical structure of edges, environments, stages, and trials, along with the lambda values per trial (yes, you can use different schedules in every trial of every stage and environment), the path to the **.dat** files, and an optional constant **shift** in the free energy, if you used boresch restraints for that trial. Next, run the actual BAR/MBAR calculation: .. code-block:: bash OMP_NUM_THREADS=8 edgembar_omp analysis/edge_ejm31.xml --fwdrev --halves .. warning:: edgembar and edgembar_omp are dependent on relative paths defined in the xml file. Thus, you must run it from outside of analysis. The two options enable additional analysis for edgembar that we will discuss in a later part of this tutorial when we are looking at the HTML reports. This script writes a Python script, which we can run as: .. code-block:: bash python analysis/edge_ejm31.py python analysis/edge_ejm31.py --netcdf Note that we are running it twice. If you only want the HTML report, the first command is enough; however, if you want a portable data file you can pass to other programs or parse yourself, the NetCDF option provides a self-contained file for you to do so. Now you should see **edge_ejm31.html** and **edge_ejm31.nc** in your analysis directory! .. end-tutorial Relevant Literature ------------------- .. footbibliography::