Mapping Common-Core and Soft-Core Atoms in Relative Binding Free Energy Calculations ==================================================================================== | Zeke A. Piskulich\ :sup:`1`, Patricio Barletta\ :sup:`1`, Ryan Snyder\ :sup:`1`, and Darrin M. York\ :sup:`1` | :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute | for Quantitative Biomedicine and Department of Chemistry and Chemical | Biology, Rutgers University, Piscataway, NJ 08854, USA Learning objectives ------------------- .. start-learning-objectives - Learn how to map common-core and soft-core atoms in relative binding free energy calculations. - Understand how different choices of common-core and soft-core atoms can affect the results of relative binding free energy calculations. .. end-learning-objectives .. start-tutorial Introduction to Atom Mapping ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Soft-Core Regions for RBFE Calculations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In Relative Binding Free Energy calculations, we are interested in calculating the free energy difference between two ligands binding to the same protein (as opposed to an Absolute Binding Free Energy calculation where we disappear a ligand completely). However, to calculate this free energy difference, we need to alchemically transform one ligand into the other. This transformation is done by identifying a common-core region between the two ligands that is not alchemically transformed, and a soft-core region that is alchemically transformed. .. note:: The common-core region is a shared set of atoms present in both ligands that are not alchemically transformed during the simulation. The soft-core region consists of atoms that are alchemically transformed from one ligand to the other during the simulation. These atoms are typically involved in the chemical differences between the two ligands. The soft-core region is called "soft-core" because its interactions are lambda dependent and use second-order smoothstep functions to avoid singularities during the alchemical transformation. The choice of which atoms to include in the common-core and soft-core regions can significantly impact the convergence and accuracy of the relative binding free energy calculations. Mapping Soft-Core and Common-Core Atoms ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Before we show any code, let's first consider some example transformations and how we might choose to map the common-core and soft-core atoms. In this tutorial, we will use an example atom mapping script. Download the following script to your local machine. .. toggle:: .. literalinclude:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/files/find_common_core.py :language: python This script uses the rdk-amber package (a small compatibility library to enable :term:`mol2 <.mol2>` files with amber atom types to be turned into rdkit molecules.) To use this package, run this (ideally in a conda/other virtual python environment). .. code-block:: python pip install rdk-amber Once your environment is set up, you can download the following :term:`mol2 <.mol2>` files in a tar.gz format here: :download:`atom_mapping.tar.gz ` Try running the script on these :term:`mol2 <.mol2>` files. .. code-block:: bash tar -xvf atom_mapping.tar.gz cd atom_mapping python find_common_core.py binder_ejm31.mol2 binder_jmc27.mol2 This will write a directory with a png image, open that and you should see something like this. .. figure:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/files/atom_mapping/ejm31~jmc27.png :alt: Atom mapping between ejm31 and jmc27 :width: 1000px Here, the red highlighted groups are in the soft-core region and the green highlighted groups are in the common-core region. Try this for the remaining :term:`mol2 <.mol2>` files - do you notice anything about these soft-core regions? Considerations for Picking Soft-Core Regions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In general, there are a few best practices to use when generating soft-core regions: 1) There should only be one soft-core region per transformation (e.g. all red regions with the tool above should be contiguous) 2) Dihedrals only obtain the benefit of enhanced sampling (dihedral parameters turned off) when soft-core regions fully cover the bond. 3) If the soft-core region gets too large, then this can negatively impact the replica exchange acceptance ratios. Now let's look at a few examples that are not from Tyk2. These come from the paper by Ali Rasouli and co-workers :footcite:t:`Rasouli_JChemInfModel_2025_v65_p223`, which looked at a theophylline-binding RNA aptamer. We will use the same script as before, but with smiles strings. In the paper, they looked at six molecules that were theophylline-like. .. jupyter-execute:: import matplotlib.pyplot as plt from rdkit import Chem from rdkit.Chem import Draw smiles_dict = { "theophylline": "CN1C2=C(C(=O)N(C1=O)C)NC=N2", "1-methylxanthine": "CN1C(=O)NC2=C(NC=N2)C1=O", "3-methylxanthine": "CN1C2=C(NC=N2)C(=O)NC1=O", "Hypoxanthine": "O=C1NC=Nc2nc[nH]c12", "Xanthine": "C1=NC2=C(N1)C(=O)NC(=O)N2", "Caffeine": "CN1C=NC2=C1C(=O)N(C(=O)N2C)C", } fig, axes = plt.subplots(2, 3, figsize=(12, 7)) for ax, (name, smiles) in zip(axes.flat, smiles_dict.items()): mol = Chem.MolFromSmiles(smiles) img = Draw.MolToImage(mol, size=(300, 220)) ax.imshow(img) ax.set_title(name, fontsize=12) ax.axis("off") fig.tight_layout() plt.show() Now let's call the `find_common_core.py` script with the SMILES strings for caffeine and xanthine. .. code-block:: python find_common_core.py "C1=NC2=C(N1)C(=O)NC(=O)N2" "CN1C=NC2=C1C(=O)N(C(=O)N2C)C" --name-a xanthine --name-b caffeine Here, note the name-a and name-b flags make a more convenient output png filename. .. figure:: /_static/files/ModularTutorials/Alchemical/RBFE_of_Tyk2/files/atom_mapping/xanthine~caffeine.png :alt: Atom mapping between xanthine and caffeine. :width: 1000px How do these soft-core regions look? Let's check how they compare to our guidelines above. 1. Only One Soft-Core Region (FAIL!) Now - this is a particularly tricky case. You could expand the soft-core region to try to correct this; however, in reality this would essentially engulf the entirety of both molecules. You could try to look for intermediate transformations (however, in this case there is only one valid transformation linking caffeine to a potential network). Can you identify which that is? Try to use the atom-mapping tool to find out. (Answer toggle below) .. toggle:: Caffeine~Theophylline is a valid transformation. So what would you do? Three options: 1) Generate a network, but only connect Caffeine via one edge. Downside: No cycle closure information about caffeine. 2) Generate a network, but define a pseudo-ligand that bridges the soft-core potential gap. Downside: Can be difficult to generate the molecule, more simulations. 3) Generate a network, but connect caffeine to other ligands with Counterpoised Binding Free Energy (CBFE) calculations instead. Downside: this is more challenging to resolve and uses a different method. .. note:: **Counterpoised Binding Free Energies (CBFE)** For cases where it is hard to draw a transformation, or because you want to connect two RBFE networks with differing scaffolds, you can no longer use the RBFE-like infrastructure that we have discussed above. Instead, a more challenging technique is needed: CBFE. Like RBFE, CBFE involves two ligands transforming into one another; however, unlike RBFE, it is a complete transformation without a common core. One ligand is completely disappearing while another ligand is completely appearing. Thus, it is closer to two ABFE calculations run in opposite directions simultaneously. Since both ligands decouple simultaneously, Boresch restraints are often used between the ligands to keep them aligned with one another (as opposed to between the protein and the ligand, as is the case in ABFE). More information on CBFE can be found in the work by Tsai and co-workers. :footcite:t:`Tsai_JChemInfModel_2026_v66_p1626` .. end-tutorial Relevant Literature ------------------- .. footbibliography::