Dihedral Scans using the WaveFront Method ========================================= | Zeke A. Piskulich\ :sup:`1`, Tim Giese\ :sup:`1` and Darrin M. York\ :sup:`1` | :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA Learning objectives ------------------- .. start-learning-objectives - Use FFPOPT to Run Dihedral Scans for a Simple Molecule using the WaveFront Method .. end-learning-objectives Relevant literature ------------------- .. attention:: Add relevant literature. 1. WaveFront Paper 2. QDPi2 Paper 3. (Eventually) FFPOPT Paper Tutorial -------- .. contents:: :local: :depth: 4 .. start-tutorial In this tutorial, we will be using a small molecular fragment that contains a deprotonated amide. These fragments can show up in certain ligands, and are presently a challenge for standard force field generation within amber. We will walk through how to use FFPOPT to scan dihedral angles with both MD force-fields and machine learning potentials, and then we will walk through the process of fitting new dihedral parameters. The files you will need are located here: - :download:`Download the tutorial files ` WaveFront Dihedral Scans ~~~~~~~~~~~~~~~~~~~~~~~~ Introduction to WaveFront Scans ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this section, we will be using the WaveFront scanning method to scan dihedral angles. The WaveFront method is a more effective way to scan dihedrals compared to traditional linear scans (which scan dihedrals sequentially). WaveFront scans work on the other hand by using a propagating wave approach, where dihedral scans begin from a certain set of "starting_nodes" (i.e., specific values of the dihedral angles). Each of these starting nodes are geometry optimized (with the value of the dihedral fixed), and an energy calculated. These nodes are then used to propagate new nodes in the next level of the WaveFront scan, which repeat the process. New nodes are only created if they either have not been calculated yet, or if the new node has a lower energy than a previously calculated node at that dihedral value. This process continues until all dihedral values have been scanned, and all nodes do not generate any new lower energy nodes. .. hint:: **Terminology:** - **Starting Nodes**: The initial set of dihedral values to begin the scan from. - **Nodes**: Specific configurations of the molecule at given dihedral angles. - **Levels**: Each iteration of the scan where new nodes are generated from the previous level. Nodes within a level can be computed in parallel. - **WaveFront**: The overall scanning method that utilizes the above concepts to efficiently explore the dihedral space. The key advantage of the WaveFront method is that it allows for significant parallelization of calculations, and in general, it also allows for significantly lower energy conformation to be found at each dihedral value, since geometries are propagated from multiple different directions. The general logic is as follows: 1. Geometry optimize the molecule with no dihedral restraints 2. Choose a set of starting nodes (i.e., dihedral values to begin the scan from), and geometry optimize the molecule at each of these dihedral values (with the dihedral restrained) 3. For each starting node, generate new nodes by incrementing/decrementing the dihedral by the scan step size (e.g., if the starting node is at 0 degrees, and the step size is 15 degrees, new nodes would be generated at -15 and +15 degrees) 4. Run the second level, which are these new nodes, geometry optimizing each (with the dihedral restrained) and calculating the energy 5. If a new node is at a dihedral value that has not been calculated yet, add it to the list of calculated nodes. If it is at a dihedral value that has been calculated, only keep it if its energy is lower than the previously calculated node at that dihedral value. 6. Repeat steps 3-5 until all dihedral values have been scanned, and no new lower energy nodes are found. The schematic below illustrates the WaveFront scanning method: .. image:: /_static/files/ModularTutorials/ForceField/bespoke_dihedrals_with_FFPOPT/imgs/ScanSchematic.png :alt: Wavefront Scan Schematic :width: 60% This often leads to many more total calculations; however, due to the parallelizability of each level of the scan, this is often significantly faster than a traditional linear scan, especially as computational methods become more expensive. Scans on the mmm_NN Model System ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this tutorial, we will apply the WaveFront scanning method to the same model system used in the Linear Scan Tutorial. As a reminder, this molecule is a small fragment containing a deprotonated amide, which can be challenging for standard force field generation methods. Using the provided input files, we will run WaveFront dihedral scans using the gaff2 force field, as well as a higher level machine learning potential, QDPi2, which has been developed within the York Group and is available in FFPOPT. To get started, we will first run the most basic type of WaveFront scan on this system, which is a single dihedral scan using the gaff2 force field. As with the LinearScan Tutorial we will be scanning the 5-4-3-1 dihedral that links the nitrogen to the five-membered ring. See a movie of the molecule below: .. figure:: /_static/files/ModularTutorials/ForceField/bespoke_dihedrals_with_FFPOPT/imgs/mmm_NN_rotor_linear.gif :alt: mmm_NN Movie :width: 25% In the movie, the dihedral being scanned is highlighted in yellow for clarity. To run a WaveFrotn scan using FFPOPT, we will use the following command: .. code-block:: bash ffpopt-DihedWavefront.py -p mmm_NN.parm7 -c mmm_NN.rst7 --model=sander --dihed="5,4,3,1" --oscan wf_scan_5-4-3-1_s1_n0.xyz --delta 10 --nproc 10 --wf_starting_nodes 1 --wf_num_conformers 0 Here, note that the argument structure is very similar to the linear scan commands; however, there are some key differences to note: - **nproc**: This argument specifies the number of processors to use for the scan. Since WaveFront scans can be parallelized, this argument is important to set appropriately based on available computational resources. - **wf_starting_nodes**: This argument specifies the number of starting nodes to use for the WaveFront scan. In this case, we are using 1 starting node, which will be the optimized geometry without any dihedral restraints. - **wf_num_conformers**: This argument specifies the number of conformers to generate at each new node. Setting this to 0 means that no additional conformers will be generated, and only the geometry optimized at the dihedral restraint will be used. After this command is run, FFPOPT will output information to the terminal about the progress of the WaveFront scan, including the current level, how many nodes are being calculated, and the dihedral values of the nodes being processed. When the scan is complete, the output file ``wf_scan_5-4-3-1_s1_n0.xyz`` will contain the results of the scan, including the optimized geometries and energies at each dihedral value, and as with before the energies are included in a output file called ``wf_scan_5-4-3-1_s1_n0.dat``. In addition to these files, the scan also writes a pickled checkpoint file called ``checkpoint_wf_scan_5-4-3-1_s1_n0.pkl``, which contains all of the information about the scan, including the geometries and energies of each node. This file can be used to restart the scan if needed, or to analyze the results further. Lastly, an image file called ``wf_workflow_wf_scan_5-4-3-1_s1_n0.png`` is also generated, which contains a schematic diagram of the levels and nodes of the WaveFront scan that were explored during the run. By incrementing the number of starting nodes (from 1 to 4), we can often improve the quality of the scan as more starting structures are propagated. For this particular molecule, there is not a huge difference with GAFF2; however, for more complex molecules this can be very important. In the below figure, we show both the code-block to run the scan with different starting nodes, as well as diagrams of the scan. .. tab-set:: .. tab-item:: 1 Starting Node .. code-block:: bash ffpopt-DihedWavefront.py -p mmm_NN.parm7 -c mmm_NN.rst7 --model=sander --dihed="5,4,3,1" --oscan wf_scan_5-4-3-1_s1_n0.xyz --delta 10 --nproc 10 --wf_starting_nodes 1 --wf_num_conformers 0 .. figure:: /_static/files/ModularTutorials/ForceField/bespoke_dihedrals_with_FFPOPT/imgs/wf_workflow_wf_scan_5-4-3-1_s1_n0.png :alt: Wavefront Scan with 1 Starting Node :width: 40% .. tab-item:: 2 Starting Nodes .. code-block:: bash ffpopt-DihedWavefront.py -p mmm_NN.parm7 -c mmm_NN.rst7 --model=sander --dihed="5,4,3,1" --oscan wf_scan_5-4-3-1_s2_n0.xyz --delta 10 --nproc 10 --wf_starting_nodes 2 --wf_num_conformers 0 .. figure:: /_static/files/ModularTutorials/ForceField/bespoke_dihedrals_with_FFPOPT/imgs/wf_workflow_wf_scan_5-4-3-1_s2_n0.png :alt: Wavefront Scan with 2 Starting Nodes :width: 40% .. tab-item:: 3 Starting Nodes .. code-block:: bash ffpopt-DihedWavefront.py -p mmm_NN.parm7 -c mmm_NN.rst7 --model=sander --dihed="5,4,3,1" --oscan wf_scan_5-4-3-1_s3_n0.xyz --delta 10 --nproc 10 --wf_starting_nodes 3 --wf_num_conformers 0 .. figure:: /_static/files/ModularTutorials/ForceField/bespoke_dihedrals_with_FFPOPT/imgs/wf_workflow_wf_scan_5-4-3-1_s3_n0.png :alt: Wavefront Scan with 3 Starting Nodes :width: 40% .. tab-item:: 4 Starting Nodes .. code-block:: bash ffpopt-DihedWavefront.py -p mmm_NN.parm7 -c mmm_NN.rst7 --model=sander --dihed="5,4,3,1" --oscan wf_scan_5-4-3-1_s4_n0.xyz --delta 10 --nproc 10 --wf_starting_nodes 4 --wf_num_conformers 0 .. figure:: /_static/files/ModularTutorials/ForceField/bespoke_dihedrals_with_FFPOPT/imgs/wf_workflow_wf_scan_5-4-3-1_s4_n0.png :alt: Wavefront Scan with 4 Starting Nodes :width: 40% Note - that as the number of starting nodes increases, the number of levels in the wavefront calculation generally goes down (as more of the dihedral space is covered from the start), but the number of total nodes calculated generally goes up (as more nodes are propagated from each starting node). However, since each level can be parallelized, the overall wall time of the calculation can often go down with more starting nodes, especially at higher levels of theory. Now, we can make a plot of the energy vs. dihedral angle for each of these scans, and compare them to the linear scan done previously with GAFF2 using xmgrace (or a plotting program of your choice). In the figure below, we show the results of the WaveFront scans with different starting nodes, as well as the linear scan for comparison. We have included the linear scan data file for reference, though it can be produced in the Linear Scan Tutorial. .. code-block:: bash xmgrace wf_scan_5-4-3-1_s1_n0.dat wf_scan_linear.dat This will produce a plot where the lines overlap nearly perfectly, indicating that for this molecule the WaveFront scan is producing similar results to the linear scan with GAFF2. We will show an example later where this is not the case; however, for clarity we have included the plot below. .. image:: /_static/files/ModularTutorials/ForceField/bespoke_dihedrals_with_FFPOPT/imgs/wavefront_linear_compare.png :alt: Wavefront vs Linear GAFF2 :width: 30% Now, we can move on to running a WaveFront scan using a higher level of theory, specifically the QDPi2 machine learning potential. This potential has been trained on high-level quantum mechanical data at the SPICE level of theory, and can often provide more accurate results than traditional force fields like GAFF2. QDPI2 is a delta-Machine learning potential, meaning that it predicts the difference between a lower level of theory (XTB) and a higher level of theory (SPICE). Therefore, within FFPOPT, QDPI2 scans are run by first performing an XTB calculation, and then applying the QDPi2 correction to obtain the final energy. This is handled completely internally within FFPOPT, so the user does not need to worry about the details. To run a WaveFront scan using QDPi2, we will use the following command: .. code-block:: bash ffpopt-DihedWavefront.py -p mmm_NN.parm7 -c mmm_NN.rst7 --model=qdpi2 --dihed="5,4,3,1" --oscan wf_scan_5-4-3-1_QDPi2_s4_n0.xyz --delta 10 --nproc 10 --wf_starting_nodes 4 --wf_num_conformers 0 You'll notice straight away that the command is almost identical to the GAFF2 command, with the only difference being the model specified as "qdpi2" instead of "sander". This is one of the key advantages of FFPOPT, as it allows for easy switching between different computational methods without changing the overall workflow. Again, all the output files are more or less the same as before, with the main output being the optimized geometries and energies at each dihedral value in the ``wf_scan_5-4-3-1_QDPi2_s4_n0.xyz`` and ``wf_scan_5-4-3-1_QDPi2_s4_n0.dat`` files, respectively. A checkpoint file and workflow diagram are also generated as before. We use 4 starting nodes in the example here to ensure good coverage of the dihedral space. After the scan is complete, we can again plot the energy vs. dihedral angle using xmgrace (or a plotting program of your choice). In the figure below, we show the results of the QDPi2 WaveFront scan along with the previous GAFF2 scan for comparison. .. code-block:: bash xmgrace wf_scan_5-4-3-1_QDPi2_s4_n0.dat wf_scan_5-4-3-1_s4_n0.dat This will produce a plot showing the differences between the GAFF2 and QDPi2 scans. As can be seen, there are significant differences in the energy profiles, particularly around certain dihedral angles. This highlights the importance of using higher-level methods for accurate dihedral parameterization. .. image:: /_static/files/ModularTutorials/ForceField/bespoke_dihedrals_with_FFPOPT/imgs/wavefront_mlp_compare.png :alt: Wavefront GAFF2 vs QDPi2 :width: 30% .. hint:: If you kill a wavefront scan before it is complete, it writes a checkpoint for each node, level, and the whole calculation. It will try to use these to restart the calculation when you run the same command again. If you want to start fresh, delete the checkpoint files! Another Example: Linear Scans Overestimate Barriers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In the previous example, we saw that the WaveFront and Linear scans with GAFF2 produced similar results for the mmm_NN molecule. However, this is not always the case, especially for more complex molecules or those with multiple rotatable bonds. To illustrate this, we will consider another molecule, which we will refer to as "complex_mol". This molecule has multiple rotatable bonds and a more complex conformational landscape, making it a good candidate for demonstrating the advantages of the WaveFront scanning method. This molecule has the same dihedral as before, involving a deprotonated amide going into a five-membered ring. Try loading the molecule into VMD to identify the atoms involved in the dihedral. .. hint:: In VMD, you can load the provided ``binder.parm7`` and ``binder.rst7`` files to visualize the molecule and identify the dihedral of interest. .. code-block:: bash vmd binder.parm7 binder.rst7 In VMD, you can click the 4 key to go into dihedral selection mode, and then select the four atoms of the dihedral. Select the C - N - C - S atoms. You should see the dihedral is close to zero (-2.6 degrees). In the terminal, you can see that these atoms correspond to atoms 15-5-1-2 by looking at the **index** entry for each atom. Now we will run the WaveFront scan on this molecule using GAFF2, similar to before. We will use 4 starting nodes to ensure good coverage of the dihedral space. The command is as follows: .. code-block:: ffpopt-DihedWavefront.py -p binder.parm7 -c binder.rst7 --model=sander --dihed="15,5,1,2" --oscan wf_scan_binder_15-5-1-2_s4_n0.xyz --delta 10 --nproc 10 --wf_starting_nodes 4 --wf_num_conformers 0 .. note:: This molecule is more complex, and much larger than the previous example, so the scan may take significantly longer to complete! .. warning:: Sometimes the WaveFront scan may take an inoordinate amount of time on complex molecules due to bad overlaps in the propagated geometries. If this happens, you can try changing the number of starting nodes to avoid generating a clash. Alternatively, you can try running with the ase-optimizer instead of GEOMETRIC, which sometimes performs better in these cases. This scan is actually a bit more interesting than the previous example, as the WaveFront scan revisits a number of geometries it had already calculated, indicating that it was able to find a lower energy conformation at those dihedral values. This is one of the key advantages of the method, as it allows for a more thorough exploration of the conformational space. You can see that here in this figure: .. image:: /_static/files/ModularTutorials/ForceField/bespoke_dihedrals_with_FFPOPT/imgs/wf_workflow_wf_scan_binder_15-5-1-2_s4_n0.png :alt: Wavefront Scan of complex_mol with 4 Starting Nodes :width: 40% We have repeated the wavefront scan using QDPi2 as well, using the following command: .. code-block:: bash ffpopt-DihedWavefront.py -p binder.parm7 -c binder.rst7 --model=qdpi2 --dihed="15,5,1,2" --oscan wf_scan_binder_15-5-1-2_QDPi2_s4_n0.xyz --delta 10 --nproc 10 --wf_starting_nodes 4 --wf_num_conformers 0 .. warning:: The QDPI2 scan likely takes more time than you want to do, especially without a gpu. If you want to skip this step, you can move on to the analysis section below. After both scans are complete, we can plot the energy vs. dihedral angle for both GAFF2 and QDPi2 using xmgrace (or a plotting program of your choice). In the figure below, we show the results of both scans. .. code-block:: bash xmgrace wf_scan_binder_15-5-1-2_s4_n0.dat wf_scan_binder_15-5-1-2_QDPi2_s4_n0.dat .. end-tutorial