Optimizing Lambda Schedules for Alchemical Free Energy Calculations =================================================================== | Harsh Amin\ :sup:`1`, Zeke A. Piskulich\ :sup:`1`, and Darrin M. York\ :sup:`1` | :sup:`1`\ Laboratory for Biomolecular Simulation Research, Institute | for Quantitative Biomedicine and Department of Chemistry and Chemical | Biology, Rutgers University, Piscataway, NJ 08854, USA Learning objectives ------------------- .. start-learning-objectives - Optimize lambda schedules for alchemical free energy calculations using FE-ToolKit - Compare different optimization criteria (acceptance ratio, phase space overlap, and Kullback-Leibler divergence) to select the appropriate method - Use the fetkutils-tischedule.py command-line tool to generate optimized lambda schedules - Interpret acceptance ratio plots to assess schedule quality .. end-learning-objectives Relevant literature ------------------- - `Alchemical Enhanced Sampling with Optimized Phase Space Overlap ` Tutorial Overview ----------------- After completing burn-in simulations for your alchemical free energy calculations, the next critical step is to optimize the lambda schedule. An optimized schedule maximizes replica exchange efficiency and enhances sampling, leading to more accurate and converged free energy estimates. This tutorial demonstrates how to use the FE-ToolKit utilities to analyze simulation data and generate optimized lambda schedules. Tutorial -------- .. contents:: :local: :depth: 4 .. start-tutorial 1. Introduction to Lambda Schedule Optimization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In alchemical free energy calculations, particularly those employing Hamiltonian Replica Exchange Molecular Dynamics (H-REMD), the choice of lambda schedule significantly impacts: - **Exchange efficiency**: Proper spacing of lambda values ensures adequate exchange acceptance ratios between neighboring replicas - **Sampling quality**: Well-distributed lambda values provide better coverage of the alchemical pathway - **Computational efficiency**: Optimized schedules reduce the number of required lambda windows while maintaining accuracy - **Convergence speed**: Enhanced sampling through replica exchange accelerates convergence of free energy estimates The goal is to find a lambda schedule that balances computational cost with statistical efficiency, ensuring that each lambda window contributes meaningful information to the overall free energy calculation. 2. Prerequisites and File Requirements ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2.1 Required Input Files ^^^^^^^^^^^^^^^^^^^^^^^^^ The lambda schedule optimization tool requires energy evaluation files from your burn-in simulations. These files must follow a specific naming convention and format: **File naming**: ``efep__.dat`` - ````: The lambda value at which the trajectory was generated - ````: The lambda value at which potential energies were evaluated **File format**: Two-column space-separated values - Column 1: Simulation time (in step numbers) - Column 2: Potential energy (kcal/mol) **Example file structure**: .. code-block:: text 0 -12543.234 500 -12538.912 1000 -12541.456 1500 -12539.782 ... 2.2 Complete Energy Matrix Requirement ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For successful optimization, you need a **complete matrix** of energy evaluation files. This means: - For every trajectory generated at lambda value :math:`\lambda_i` - You must have energy evaluations at **all** lambda values :math:`\lambda_j` (where :math:`j` ranges from 0 to :math:`N_{windows}`) For example, if you have 16 initial lambda windows (0 to 15), you should have 16 × 16 = 256 total ``efep_*.dat`` files. .. note:: The diagonal files (``efep_i_i.dat``) represent energies evaluated at the same lambda as the simulation, while off-diagonal files represent cross-evaluations needed for MBAR analysis and schedule optimization. 2.3 Directory Organization ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Organize your energy evaluation files in a dedicated directory: .. code-block:: bash burn_in_data/ ├── efep_0.00_0.00.dat ├── efep_0.00_0.07.dat ├── efep_0.00_0.13.dat ├── ... ├── efep_1.00_0.93.dat └── efep_1.00_1.00.dat 3. Using fetkutils-tischedule.py ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3.1 Generate Optimized Schedule ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The FE-ToolKit ``fetkutils-tischedule.py`` utility implements three distinct optimization criteria, each targeting different aspects of alchemical free energy calculations. The optimization works by minimizing the variance of the chosen metric across all neighboring lambda window pairs. **Acceptance Ratio (AR)** - ``--ar`` flag The Acceptance Ratio represents the predicted Hamiltonian Replica Exchange (H-REMD) acceptance probability between adjacent lambda windows. The tool computes energy differences from the complete energy evaluation matrix and averages acceptance probabilities over all sampled configurations. The objective is to avoid low-acceptance bottlenecks by maintaining reasonably uniform exchange probabilities between adjacent λ windows. **Phase Space Overlap Index (PSO)** - ``--pso`` flag The Phase Space Overlap Index quantifies the overlap between probability distributions of potential energy differences at neighboring lambda values. The tool fits energy distributions to Gaussians and computes their overlap, minimizing variance across all window pairs to ensure uniform overlap throughout the schedule. **Kullback-Leibler Divergence (KL)** - ``--kl`` flag The symmetrized Kullback-Leibler Divergence measures the statistical distance between potential energy distributions at different lambda values. The tool spaces lambda values to maintain consistent statistical similarity between neighboring windows. This criterion serves as an alternative overlap-based metric for analyzing statistical differences along alchemical pathways The ``fetkutils-tischedule.py`` tool from the FE-ToolKit is the primary utility for lambda schedule optimization. Choose one of the three optimization criteria: .. tab-set:: .. tab-item:: Acceptance Ratio (AR) Use for H-REMD simulations to optimize replica exchange efficiency: .. code-block:: bash fetkutils-tischedule.py --opt 16 --ar --ssc \ --plot opt_ar_16.png -o opt_ar_16.txt \ burn_in_data/ .. tab-item:: Phase Space Overlap (PSO) Use for non-exchange methods (MBAR/TI) to ensure adequate overlap: .. code-block:: bash fetkutils-tischedule.py --opt 16 --pso --ssc \ --plot opt_pso_16.png -o opt_pso_16.txt \ burn_in_data/ .. tab-item:: Kullback-Leibler (KL) Use for theoretical analysis and identifying regions of rapid change: .. code-block:: bash fetkutils-tischedule.py --opt 16 --kl --ssc \ --plot opt_kl_16.png -o opt_kl_16.txt \ burn_in_data/ - ``--ar``: Optimize based on replica exchange acceptance ratio (for H-REMD) - ``--pso``: Optimize based on phase space overlap index (for MBAR/TI) - ``--kl``: Optimize based on Kullback-Leibler divergence (theoretical analysis) - ``--opt N``: Optimize to N lambda windows (e.g., 12-16 for standard systems, 20-24 for challenging transformations) - ``--ssc``: Use Soft-core Scaling parameterization (reduces to 2 parameters, ensures smooth spacing) - ``--plot FILE``: Generate visualization of acceptance ratios or overlaps - ``-o FILE`` or ``--out FILE``: Write optimized lambda schedule to file 3.2 Predict the AR/PSO/KL Profile for Existing Schedule ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To evaluate the performance of your existing lambda schedule without optimization, use the following command: .. code-block:: bash fetkutils-tischedule.py --read --ar/pso/kl --plot filename.png - ``--read``: Analyze existing schedule without optimization. The file should contain 1 column; the rows are the lambda values. - ``--ar``, ``--pso``, or ``--kl``: Choose the metric to evaluate - ``--plot filename.png``: Generate plot of the metric profile 4. Analyzing Results ~~~~~~~~~~~~~~~~~~~~ 4.1 Interpreting the Output File ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The output file (e.g., ``opt_ar_16.txt``) contains the optimized lambda schedule. A typical file looks like: .. code-block:: text 0.00000 0.03421 0.07892 0.13156 0.19345 0.26701 0.35432 0.45678 0.57234 0.69876 0.82345 0.90123 0.94567 0.97234 0.99012 1.00000 These values represent the lambda schedule to use in your production simulations. Note: - First value is always 0.0 - Last value is always 1.0 .. end-tutorial