Building Simulation Boxes for ABFE Calculations (Tyk2) ====================================================== | Zeke A. Piskulich\ :sup:`1`, Patricio Barletta\ :sup:`1`, Ryan Snyder\ :sup:`1`, and Darrin M. York\ :sup:`1` | :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute | for Quantitative Biomedicine and Department of Chemistry and Chemical | Biology, Rutgers University, Piscataway, NJ 08854, USA Learning objectives ------------------- .. start-learning-objectives - Learn how to build simulation boxes for absolute binding free energy calculations (ABFE) using AmberTools - Understand the options for solvating simulation boxes and adding ions - Learn how to make consistent solvent boxes for both the bound and unbound states of the ligand. .. end-learning-objectives Relevant literature ------------------- - Coming Soon! Tutorial -------- In this Tutorial, you will learn how to set up and run absolute binding free energy calculations using Amberflow, a workflow engine developed in the York Lab for automating Alchemical Free Energy Calculations with Amber. .. contents:: :local: :depth: 4 .. start-tutorial Building Simulation Boxes for ABFE Calculations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To get started, you will need to download the tutorial files. The files include pdbs for the protein, and pdbs with docked coordinates for the ligand. The files in this download also include pre-generated forcefield parameters for each of the four ligands, which you will use in this tutorial. - :download:`Download the tutorial files ` Once you have downloaded the files, extract them into a directory where you will be working on this tutorial. .. code-block:: bash tar -xvzf build_files.tar.gz cd build_files Building the Ligand in the Aqueous Phase ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this section, you will build the ligand in a solvent box. You will be using tleap to build the system. For this tutorial, we will be starting with the ligand ejm_31 and doing both builds for the bound and unbound states with this ligand. We will then be using the same procudure for the other three ligands. For each code block below, we have provided tabs for the other ligands so that you can easily switch between them. .. warning:: In this tutorial, we will not be making a significant effort to enforce best practices for preparing the solvent box. In a production settign, you likely will desire to enforce that the solvent box has the same number of water molecules for all ligands, and depending on the identity of the ligand you may want to use a different box size or shape. To build the ligand in a solvent box, you will need to create a tleap input file. .. tab-set:: .. tab-item:: ejm_31 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_ejm_31.lib loadamberparams ../binder_ejm_31.frcmod mol = loadpdb ../binder_ejm_31.pdb setbox mol centers addions2 mol Na+ 0 addions2 mol Cl- 0 solvateoct mol TIP4PBOX 16.0 saveamberparm mol prebox_binder_ejm_31.parm7 prebox_binder_ejm_31.rst7 quit .. tab-item:: ejm_42 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_ejm_42.lib loadamberparams ../binder_ejm_42.frcmod mol = loadpdb ../binder_ejm_42.pdb setbox mol centers addions2 mol Na+ 0 addions2 mol Cl- 0 solvateoct mol TIP4PBOX 16.0 saveamberparm mol prebox_binder_ejm_42.parm7 prebox_binder_ejm_42.rst7 quit .. tab-item:: jmc_27 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew.. tab-set:: loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_jmc_27.lib loadamberparams ../binder_jmc_27.frcmod mol = loadpdb ../binder_jmc_27.pdb setbox mol centers addions2 mol Na+ 0 addions2 mol Cl- 0 solvateoct mol TIP4PBOX 16.0 saveamberparm mol prebox_binder_jmc_27.parm7 prebox_binder_jmc_27.rst7 quit .. tab-item:: jmc_28 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_jmc_28.lib loadamberparams ../binder_jmc_28.frcmod mol = loadpdb ../binder_jmc_28.pdb setbox mol centers addions2 mol Na+ 0 addions2 mol Cl- 0 solvateoct mol TIP4PBOX 16.0 saveamberparm mol prebox_binder_jmc_28.parm7 prebox_binder_jmc_28.rst7 quit To run any of these scripts, save the desired script (e.g., tleap_ejm_31.in) and then run the following command: .. code-block:: bash tleap -f tleap_ejm_31.in Calculating the Number of Ions to Add ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If everything runs correctly, you should see two new files in your directory. We've called these prebox_binder_XXX.parm7 and prebox_binder_XXX.rst7, where XXX is the ligand name. These files contain the topology and coordinates for the ligand in a solvent box, with any neutralizing ions added. Now, if we were happy without ions, we could move on to minimization and equilibration of this system. However, we want to add ions to the system to mimic a 150 mM NaCl concentration. To do this, we will first need to calculate the number of ions to add. First, we must find the number of water molecules in the box. We can use a quick MDAnalysis script to do this. Save the following script as count_waters.py: .. code-block:: python import MDAnalysis as mda import sys u = mda.Universe(sys.argv[1], sys.argv[2], format="INPCRD") waters = u.select_atoms("resname WAT") print(f"Number of water molecules: {len(waters.residues)}") Then run the script as follows: .. tab-set:: .. tab-item:: ejm_31 .. code-block:: bash python count_waters.py prebox_binder_ejm_31.parm7 prebox_binder_ejm_31.rst7 .. tab-item:: ejm_42 .. code-block:: bash python count_waters.py prebox_binder_ejm_42.parm7 prebox_binder_ejm_42.rst7 .. tab-item:: jmc_27 .. code-block:: bash python count_waters.py prebox_binder_jmc_27.parm7 prebox_binder_jmc_27.rst7 .. tab-item:: jmc_28 .. code-block:: bash python count_waters.py prebox_binder_jmc_28.parm7 prebox_binder_jmc_28.rst7 Each of these will output a number of water molecules in the box. We will then use these to calculate the number of ions to add. The way we do this calculation is to use the fact that water is 55 M, so we can use the following formula to calculate the number of ions to add: .. math:: N_{ions} = \frac{N_{water} * 0.150 M}{55 M} When running for this tutorial, we found that we had 2,470 water molecules in the box for ejm_31, which gives us 7 ions of each type to add. For simplicity, we will be adding 7 ions of each type to each of the systems in this tutorial. Now we can write a new tleap script to add the ions and save the final topology and coordinates. .. tab-set:: .. tab-item:: ejm_31 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_ejm_31.lib loadamberparams ../binder_ejm_31.frcmod mol = loadpdb ../binder_ejm_31.pdb setbox mol centers addions2 mol Na+ 0 addions2 mol Cl- 0 solvateoct mol TIP4PBOX 16.0 addionsrand mol Na+ 7 Cl- 7 6.0 saveamberparm mol binder_ejm_31.parm7 binder_ejm_31.rst7 quit .. tab-item:: ejm_42 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_ejm_42.lib loadamberparams ../binder_ejm_42.frcmod mol = loadpdb ../binder_ejm_42.pdb setbox mol centers addions2 mol Na+ 0 addions2 mol Cl- 0 solvateoct mol TIP4PBOX 16.0 addionsrand mol Na+ 7 Cl- 7 6.0 saveamberparm mol binder_ejm_42.parm7 binder_ejm_42.rst7 quit .. tab-item:: jmc_27 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_jmc_27.lib loadamberparams ../binder_jmc_27.frcmod mol = loadpdb ../binder_jmc_27.pdb setbox mol centers addions2 mol Na+ 0 addions2 mol Cl- 0 solvateoct mol TIP4PBOX 16.0 addionsrand mol Na+ 7 Cl- 7 6.0 saveamberparm mol binder_jmc_27.parm7 binder_jmc_27.rst7 quit .. tab-item:: jmc_28 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_jmc_28.lib loadamberparams ../binder_jmc_28.frcmod mol = loadpdb ../binder_jmc_28.pdb setbox mol centers addions2 mol Na+ 0 addions2 mol Cl- 0 solvateoct mol TIP4PBOX 16.0 addionsrand mol Na+ 7 Cl- 7 6.0 saveamberparm mol binder_jmc_28.parm7 binder_jmc_28.rst7 quit To run any of these scripts, save the desired script (e.g., tleap_ejm_31.in) and then run the following command: .. code-block:: bash tleap -f tleap_ejm_31.in Again, these scripts can be run as follows: .. tab-set:: .. tab-item:: ejm_31 .. code-block:: bash tleap -f tleap_ejm_31_addions.in .. tab-item:: ejm_42 .. code-block:: bash tleap -f tleap_ejm_42_addions.in .. tab-item:: jmc_27 .. code-block:: bash tleap -f tleap_jmc_27_addions.in .. tab-item:: jmc_28 .. code-block:: bash tleap -f tleap_jmc_28_addions.in There we go! We now have the ligand built in a solvent box with ions added. You should see two new files in your directory called binder_XXX.parm7 and binder_XXX.rst7, where XXX is the ligand name. These files contain the topology and coordinates for the ligand in a solvent box, with 150 mM NaCl added. Building the Bound Protein-Ligand Complex ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Now, we will build the bound protein-ligand complex. The procedure is very similar to what we did for the ligand in a solvent box, but now we will be loading in the protein as well. We will again be using tleap to build the system; however, this time we will be loading in the protein pdb as well as the ligand pdb. .. tab-set:: .. tab-item:: ejm_31 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_ejm_31.lib loadamberparams ../binder_ejm_31.frcmod mol = loadpdb ../binder_ejm_31.pdb prot = loadpdb ../target.pdb complex = combine {mol prot} setbox complex centers addions2 complex Na+ 0 addions2 complex Cl- 0 solvateoct complex TIP4PBOX 14.0 saveamberparm complex prebox_target_ejm_31.parm7 prebox_target_ejm_31.rst7 quit .. tab-item:: ejm_42 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_ejm_42.lib loadamberparams ../binder_ejm_42.frcmod mol = loadpdb ../binder_ejm_42.pdb prot = loadpdb ../target.pdb complex = combine {mol prot} setbox complex centers addions2 complex Na+ 0 addions2 complex Cl- 0 solvateoct complex TIP4PBOX 14.0 saveamberparm complex prebox_target_ejm_42.parm7 prebox_target_ejm_42.rst7 quit .. tab-item:: jmc_27 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_jmc_27.lib loadamberparams ../binder_jmc_27.frcmod mol = loadpdb ../binder_jmc_27.pdb prot = loadpdb ../target.pdb complex = combine {mol prot} setbox complex centers addions2 complex Na+ 0 addions2 complex Cl- 0 solvateoct complex TIP4PBOX 14.0 saveamberparm complex prebox_target_jmc_27.parm7 prebox_target_jmc_27.rst7 quit .. tab-item:: jmc_28 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_jmc_28.lib loadamberparams ../binder_jmc_28.frcmod mol = loadpdb ../binder_jmc_28.pdb prot = loadpdb ../target.pdb complex = combine {mol prot} setbox complex centers addions2 complex Na+ 0 addions2 complex Cl- 0 solvateoct complex TIP4PBOX 14.0 saveamberparm complex prebox_target_jmc_28.parm7 prebox_target_jmc_28.rst7 quit To run any of these scripts, save the desired script (e.g., tleap_target_ejm_31.in) and then run the following command: .. code-block:: bash tleap -f tleap_target_ejm_31.in Again, if everything runs correctly, you should see two new files in your directory. We've called these prebox_target_XXX.parm7 and prebox_target_XXX.rst7, where XXX is the ligand name. These files contain the topology and coordinates for the protein-ligand complex in a solvent box, with any neutralizing ions added. Now, if we were happy without ions, we could move on to minimization and equilibration of this system. However, we want to add ions to the system to mimic a 150 mM NaCl concentration. Just like before, we will use the count_waters.py script to find the number of water molecules in the box. Run the script as follows: .. tab-set:: .. tab-item:: ejm_31 .. code-block:: bash python count_waters.py prebox_target_ejm_31.parm7 prebox_target_ejm_31.rst7 .. tab-item:: ejm_42 .. code-block:: bash python count_waters.py prebox_target_ejm_42.parm7 prebox_target_ejm_42.rst7 .. tab-item:: jmc_27 .. code-block:: bash python count_waters.py prebox_target_jmc_27.parm7 prebox_target_jmc_27.rst7 .. tab-item:: jmc_28 .. code-block:: bash python count_waters.py prebox_target_jmc_28.parm7 prebox_target_jmc_28.rst7 In this case, we found that we had 11,023 water molecules in the box for ejm_31, which gives us 30 ions of each type to add. For simplicity, we will be adding 30 ions of each type to each of the systems in this tutorial. .. tab-set:: .. tab-item:: ejm_31 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_ejm_31.lib loadamberparams ../binder_ejm_31.frcmod mol = loadpdb ../binder_ejm_31.pdb prot = loadpdb ../target.pdb complex = combine {mol prot} setbox complex centers addions2 complex Na+ 0 addions2 complex Cl- 0 solvateoct complex TIP4PBOX 14.0 addionsrand complex Na+ 30 Cl- 30 6.0 saveamberparm complex target_ejm_31.parm7 target_ejm_31.rst7 quit .. tab-item:: ejm_42 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_ejm_42.lib loadamberparams ../binder_ejm_42.frcmod mol = loadpdb ../binder_ejm_42.pdb prot = loadpdb ../target.pdb complex = combine {mol prot} setbox complex centers addions2 complex Na+ 0 addions2 complex Cl- 0 solvateoct complex TIP4PBOX 14.0 addionsrand complex Na+ 30 Cl- 30 6.0 saveamberparm complex target_ejm_42.parm7 target_ejm_42.rst7 quit .. tab-item:: jmc_27 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_jmc_27.lib loadamberparams ../binder_jmc_27.frcmod mol = loadpdb ../binder_jmc_27.pdb prot = loadpdb ../target.pdb complex = combine {mol prot} setbox complex centers addions2 complex Na+ 0 addions2 complex Cl- 0 solvateoct complex TIP4PBOX 14.0 addionsrand complex Na+ 30 Cl- 30 6.0 saveamberparm complex target_jmc_27.parm7 target_jmc_27.rst7 quit .. tab-item:: jmc_28 .. code-block:: bash source leaprc.protein.ff19SB source leaprc.phosaa19SB loadamberparams frcmod.ff19SB source leaprc.gaff2 source leaprc.water.tip4pew loadamberparams frcmod.tip4pew loadamberparams frcmod.ionsjc_tip4pew loadoff ../binder_jmc_28.lib loadamberparams ../binder_jmc_28.frcmod mol = loadpdb ../binder_jmc_28.pdb prot = loadpdb ../target.pdb complex = combine {mol prot} setbox complex centers addions2 complex Na+ 0 addions2 complex Cl- 0 solvateoct complex TIP4PBOX 14.0 addionsrand complex Na+ 30 Cl- 30 6.0 saveamberparm complex target_jmc_28.parm7 target_jmc_28.rst7 quit As with above, to run any of these scripts, save the desired script (e.g., tleap_target_ejm_31_addions.in) and then run the following command: .. tab-set:: .. tab-item:: ejm_31 .. code-block:: bash tleap -f tleap_target_ejm_31_addions.in .. tab-item:: ejm_42 .. code-block:: bash tleap -f tleap_target_ejm_42_addions.in .. tab-item:: jmc_27 .. code-block:: bash tleap -f tleap_target_jmc_27_addions.in .. tab-item:: jmc_28 .. code-block:: bash tleap -f tleap_target_jmc_28_addions.in Thus, now we have topology files for the ligand in complex for all four ligands! These ligands can now be used for further simulations and analysis. Optional: Applying Hydrogen Mass Reparationing (HMR) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Often, in order to speed up simulations, we will apply hydrogen mass repartitioning (HMR) to the system. This allows us to use a larger timestep (4 fs vs 2 fs) for production simulations. To do this, we will be using the ParmEd package that comes with AmberTools. The following command will apply HMR to a given topology and coordinate file, and output new files with HMR applied. We will need to make a parmed input file for each of the systems. Here is an example for ejm_31 in complex: .. tab-set:: .. tab-item:: ejm_31 .. code-block:: bash parm target_ejm_31.parm7 hmassrepartition outparm target_ejm_31_hmr.parm7 .. tab-item:: ejm_42 .. code-block:: bash parm target_ejm_42.parm7 hmassrepartition outparm target_ejm_42_hmr.parm7 .. tab-item:: jmc_27 .. code-block:: bash parm target_jmc_27.parm7 hmassrepartition outparm target_jmc_27_hmr.parm7 .. tab-item:: jmc_28 .. code-block:: bash parm target_jmc_28.parm7 hmassrepartition outparm target_jmc_28_hmr.parm7 The same can be done for the binder: .. tab-set:: .. tab-item:: ejm_31 .. code-block:: bash parm binder_ejm_31.parm7 hmassrepartition outparm binder_ejm_31_hmr.parm7 .. tab-item:: ejm_42 .. code-block:: bash parm binder_ejm_42.parm7 hmassrepartition outparm binder_ejm_42_hmr.parm7 .. tab-item:: jmc_27 .. code-block:: bash parm binder_jmc_27.parm7 hmassrepartition outparm binder_jmc_27_hmr.parm7 .. tab-item:: jmc_28 .. code-block:: bash parm binder_jmc_28.parm7 hmassrepartition outparm binder_jmc_28_hmr.parm7 These files can be run using the following command: .. code-block:: bash parmed -i parmed_ejm_31.in For simplicity, a script to do this for all four ligands is provided below: .. code-block:: bash for ligand in ejm_31 ejm_42 jmc_27 jmc_28; do echo "parm target_${ligand}.parm7" > parmed_${ligand}_target.in echo "hmassrepartition" >> parmed_${ligand}_target.in echo "outparm target_${ligand}_hmr.parm7" >> parmed_${ligand}_target.in echo "parm binder_${ligand}.parm7" > parmed_${ligand}_binder.in echo "hmassrepartition" >> parmed_${ligand}_binder.in echo "outparm binder_${ligand}_hmr.parm7" >> parmed_${ligand}_binder.in parmed -i parmed_${ligand}_target.in parmed -i parmed_${ligand}_binder.in done .. end-tutorial