Installing and Training with DeepMD-Kit

Timothy Giese1, Zeke A. Piskulich1, and Darrin M.

York1

1Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA

Learning objectives

  • Getting started in the York Lab

Tutorial

DeepMD-Kit Installation on Amarel

To install DeepMD-Kit on Amarel, it is best to go ahead and also install support for Mace architectures as well while you are at it. Tim has put together a script that will install both DeepMD-Kit and Mace for you at once, which you can find here:

#!/bin/bash


######### THIS INSTALL IS FOR TRAINING MODELS ON A GPU NOT FOR RUNNING MLP SIMULATIONS ###########


set -e
set -u


WITH_XTB=0
WITH_STD_DEEPMD=0


TOPDIR=${PWD}
if [ ! -d ${TOPDIR}/modulefiles/pydeepmdkit ]; then
mkdir -p ${TOPDIR}/modulefiles/pydeepmdkit
fi

######## INSTALL PYTHON ######

if [ ! -d ${TOPDIR}/software/python ]; then
    mkdir -p ${TOPDIR}/software/python
    cd ${TOPDIR}/software/python
    if [ ! -e miniconda ]; then
        if [ ! -e Miniconda3-py310_24.5.0-0-Linux-x86_64.sh ]; then
            wget https://repo.anaconda.com/miniconda/Miniconda3-py310_24.5.0-0-Linux-x86_64.sh
        fi
        bash ./Miniconda3-py310_24.5.0-0-Linux-x86_64.sh -b -f -p miniconda
    fi

fi

cd ${TOPDIR}
if [ ! -d ${TOPDIR}/modulefiles/pydeepmdkit/compiler ]; then
mkdir -p ${TOPDIR}/modulefiles/pydeepmdkit/compiler
fi

cat <<EOF > ${TOPDIR}/modulefiles/pydeepmdkit/compiler/default
#%Module -*- tcl -*-
##
## modulefile
##

#module load york/cuda/12.1
module load cuda/12.1.0
module load cudnn/8.1.3-jlb638
#module load york/cuda/11.8
module load york/openblas/0.3.8
module load gcc/10.2.0/openmpi

#setenv CUDA_DIR "\${CUDA_HOME}"
setenv CUDA_DIR "/opt/sw/packages/cuda/12.1.0"
setenv CUDA_ROOT "/opt/sw/packages/cuda/12.1.0"
setenv XLA_FLAGS "--xla_gpu_cuda_data_dir=/opt/sw/packages/cuda/12.1.0"

setenv OPENBLAS_NUM_THREADS 1
setenv PYTHONUSERBASE "${TOPDIR}/software/python/miniconda"
prepend-path PATH "${TOPDIR}/software/python/miniconda/bin"
prepend-path PYTHONPATH "${TOPDIR}/software/python/miniconda/lib/python3.10/site-packages"
prepend-path LIBRARY_PATH "${TOPDIR}/software/python/miniconda/lib"
prepend-path LD_LIBRARY_PATH "${TOPDIR}/software/python/miniconda/lib"
prepend-path CPATH "${TOPDIR}/software/python/miniconda/include"

EOF

export MODULEPATH="${TOPDIR}/modulefiles:${MODULEPATH}"

module load pydeepmdkit/compiler


######## INSTALL TENSORFLOW AND PYTORCH ########


python3 -m ensurepip --user --upgrade
python3 -m pip install --user --upgrade pip
#python3 -m pip install --user numpy scipy matplotlib wheel cmake tensorflow torch torchvision torchaudio
python3 -m pip install --user numpy scipy matplotlib wheel cmake tensorflow
python3 -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121


module purge
module load pydeepmdkit/compiler
module load git


PYVERSION=3.10


########## INSTALL DeepMD-kit (training needs to work on a GPU otherwise it is stupid slow) ############


PYPREFIX=${TOPDIR}/software/pydeepmdkit/deepmdkit
if [ ! -d ${PYPREFIX} ]; then
mkdir -p ${PYPREFIX}
fi

PREFIX=${PYPREFIX}/local
export PYTHONUSERBASE=${PREFIX}
export PYTHONPATH="${PREFIX}/lib/python${PYVERSION}/site-packages:${PYTHONPATH}"
#export LD_LIBRARY_PATH="${PREFIX}/lib64/python${PYVERSION}/site-packages/torch/lib:${LD_LIBRARY_PATH}"
#export LIBRARY_PATH="${PREFIX}/lib64/python${PYVERSION}/site-packages/torch/lib:${LIBRARY_PATH}"

export LD_LIBRARY_PATH="/lib64:${LD_LIBRARY_PATH}"
export LIBRARY_PATH="/lib64:${LIBRARY_PATH}"

export LD_LIBRARY_PATH="${TOPDIR}/software/python/miniconda/lib/python${PYVERSION}/site-packages/torch/lib:${LD_LIBRARY_PATH}"
export LIBRARY_PATH="${TOPDIR}/software/python/miniconda/lib/python${PYVERSION}/site-packages/torch/lib:${LIBRARY_PATH}"


#exit

if [ ! -d ${PREFIX} ]; then
    mkdir -p ${PREFIX}
fi

#export PYTORCH_ROOT=${PREFIX}/lib/python${PYVERSION}/site-packages/torch
export PYTORCH_ROOT="${TOPDIR}/software/python/miniconda/lib/python${PYVERSION}/site-packages/torch"


if [ ! -d ${PYPREFIX}/src ]; then
    mkdir -p ${PYPREFIX}/src
fi

cd ${PYPREFIX}/src

if [ ! -d deepmd-kit-devel ]; then
    git clone https://github.com/deepmodeling/deepmd-kit -b devel deepmd-kit-devel
fi


if [ ! -d deepmd-kit-gnn ]; then
    git clone https://github.com/njzjz/deepmd-gnn deepmd-kit-gnn
fi

cd deepmd-kit-devel
git fetch --tags
git pull


echo "GCC COMPILER: " $(which gcc)

#
# deepmd-kit Python interface
#
# See https://github.com/pytorch/pytorch/issues/113948
#

#### DP_VARIANT=cuda required for training on gpu

export TORCH_CUDA_ARCH_LIST="7.5 8.0 8.6 8.9 9.0"
DP_ENABLE_PYTORCH=1 DP_VARIANT=cuda python3 -m pip install . --upgrade --prefix=${PREFIX}


######### INSTALL DeePMD-kit-gnn PLUGIN FOR MACE ########

cd ../

cd deepmd-kit-gnn
git pull
export CMAKE_PREFIX_PATH=$(python3 -c "import torch;print(torch.utils.cmake_prefix_path)")
python3 -m pip install . --upgrade --prefix=${PREFIX}

# Newer versions of pytorch complain about the line in e3nn that loads constants.pt
# The following is a hack until it is fixed upstream

wfile=${PREFIX}/lib/python${PYVERSION}/site-packages/e3nn/o3/_wigner.py
if [ ! -e ${wfile} ]; then
echo "Expected file ${wfile}"
exit 1
fi
sed -i -e "s|'constants.pt'))|'constants.pt'), weights_only=False)|" ${wfile}


if [ ! -d ${TOPDIR}/modulefiles/pydeepmdkit/deepmdkit ]; then
mkdir -p ${TOPDIR}/modulefiles/pydeepmdkit/deepmdkit
fi

cat <<EOF > ${TOPDIR}/modulefiles/pydeepmdkit/deepmdkit/default
#%Module -*- tcl -*-
##
## modulefile
##

module load pydeepmdkit/compiler/default

setenv OPENBLAS_NUM_THREADS 1
setenv DP_PLUGIN_PATH "${PREFIX}/lib/python${PYVERSION}/site-packages/deepmd_gnn/lib/libdeepmd_gnn.so"
prepend-path PYTHONPATH "${PREFIX}/lib/python${PYVERSION}/site-packages"
prepend-path LIBRARY_PATH "${PREFIX}/lib"
prepend-path LD_LIBRARY_PATH "${PREFIX}/lib"
prepend-path PATH "${PREFIX}/bin"

EOF

You can then load this module and use DeepMd-Kit as follows:

module purge
module use ${TOPDIR}/modulefiles
module load pydeepmdkit/deepmdkit/default

Note that TOPDIR is wherever you put the installation script. You can add the module use line to your .bashrc file to make it permanent.

HDF5 File format

Training a Model