Fit energy

4.11. Fit energy #

Note

Supported backends: TensorFlow , PyTorch , JAX , Paddle , DP

In this section, we will take $deepmd_source_dir/examples/water/se_e2_a/input.json as an example of the input file.

4.11.1. Theory#

In the DP model, we let the fitting network $\mathcal{F}_ 0$ maps the descriptor $\mathcal{D}^i$ to a scalar, where the subscript $0$ means that the output is a zero-order tensor (i.e. scalar). The model can then be used to predict the total potential energy of the system by

\[ E = \sum_i E_i = \sum_i \mathcal F_0 (\mathcal D^i),\]

where the output of the fitting network is treated as the atomic potential energy contribution, i.e. $E_i$. The output scalar can also be treated as other scalar properties defined on an atom, for example, the partial charge of atom $i$.

In some cases, atomic-specific or frame-specific parameters, such as electron temperature, may be treated as extra input to the fitting network. We denote the atomic and frame-specific parameters by $\boldsymbol{P}^i\in \mathbb{R}^{N_p}$ (with $N_p$ being the dimension) and $\boldsymbol{Q}\in \mathbb{R}^{N_q}$ (with $N_q$ being the dimension), respectively.

\[ E_i=\mathcal{F}_0(\{\mathcal{D}^i, \boldsymbol{P}^i, \boldsymbol Q\}).\]

The atomic force $\boldsymbol{F}_ {i}$ and the virial tensor $\boldsymbol{\Xi} = (\Xi_{\alpha\beta})$ (if PBC is applied) can be derived from the potential energy $E$:

\[ F_{i,\alpha}=-\frac{\partial E}{\partial r_{i,\alpha}},\]

\[ \Xi_{\alpha\beta}=-\sum_{\gamma} \frac{\partial E}{\partial h_{\gamma\alpha}} h_{\gamma\beta},\]

where $r_{i,\alpha}$ and $F_{i,\alpha}$ denotes the $\alpha$-th component of the coordinate and force of atom $i$. $h_{\alpha\beta}$ is the $\beta$-th component of the $\alpha$-th basis vector of the simulation region.

The properties $\eta$ of the energy loss function could be energy $E$, force $\boldsymbol{F}$, virial $\boldsymbol{\Xi}$, relative energy $\Delta E$, or any combination among them, and the loss functions of them are

\[ L_E(\boldsymbol{x};\boldsymbol{\theta})=\frac{1}{N}(E(\boldsymbol{x};\boldsymbol{\theta})-E^*)^2,\]

\[ L_F(\boldsymbol{x};\boldsymbol{\theta})=\frac{1}{3N}\sum_{k=1}^{N}\sum_{\alpha=1}^3(F_{k,\alpha}(\boldsymbol{x};\boldsymbol{\theta})-F_{k,\alpha}^*)^2,\]

\[ L_\Xi(\boldsymbol{x};\boldsymbol{\theta})=\frac{1}{9N}\sum_{\alpha,\beta=1}^{3}(\Xi_{\alpha\beta}(\boldsymbol{x};\boldsymbol{\theta})-\Xi_{\alpha\beta}^*)^2,\]

\[ L_{\Delta E}(\boldsymbol{x};\boldsymbol{\theta})=\frac{1}{N}({\Delta E}(\boldsymbol{x};\boldsymbol{\theta})-{\Delta E}^*)^2,\]

where $F_{k,\alpha}$ is the $\alpha$-th component of the force on atom $k$, and the superscript $\ast$ indicates the label of the property that should be provided in advance. Using $N$ ensures that each loss of fitting property is averaged over atomic contributions before they contribute to the total loss by weight.

By default, the energy and virial losses are normalized by the number of atoms $N$ as shown above. When intensive loss normalization is enabled, these terms are instead normalized by $N^2$. For the energy loss, this converts it to the square of the per-atom energy error:

\[ L_E^{\text{intensive}}(\boldsymbol{x};\boldsymbol{\theta})=\left(\frac{E(\boldsymbol{x};\boldsymbol{\theta})-E^*}{N}\right)^2 = \frac{1}{N^2}(E(\boldsymbol{x};\boldsymbol{\theta})-E^*)^2,\]

and similarly for the virial loss:

\[ L_\Xi^{\text{intensive}}(\boldsymbol{x};\boldsymbol{\theta})=\frac{1}{9N^2}\sum_{\alpha,\beta=1}^{3}(\Xi_{\alpha\beta}(\boldsymbol{x};\boldsymbol{\theta})-\Xi_{\alpha\beta}^*)^2.\]

Intensive normalization makes the loss magnitudes independent of the system size $N$ (assuming per-atom errors are consistent), which is crucial for multi-task training involving datasets with varying system sizes to prevent larger systems from dominating the training process.

If part of atoms is more important than others, for example, certain atoms play an essential role when calculating free energy profiles or kinetic isotope effects, the MSE of atomic forces with prefactors $q_{k}$ can also be used as the loss function:

\[ L_F^p(\mathbf{x};\boldsymbol{\theta})=\frac{1}{3N}\sum_{k=1}^{N} \sum_{\alpha} q_{k} (F_{k,\alpha}(\mathbf{x};\boldsymbol{\theta})-F_{k,\alpha}^*)^2.\]

The atomic forces with larger prefactors will be fitted more accurately than those in other atoms.

If some forces are quite large, for example, forces can be greater than 60 eV/Å in high-temperature reactive simulations, one may also prefer the force loss is relative to the magnitude:

\[ L^r_F(\boldsymbol{x};\boldsymbol{\theta})=\frac{1}{3N}\sum_{k=1}^{N}\sum_\alpha \left(\frac{F_{k,\alpha}(\boldsymbol{x};\boldsymbol{\theta})-F_{k,\alpha}^*}{\lvert\boldsymbol{F}^\ast_k\lvert + \nu}\right)^2.\]

where $\nu$ is a small constant used to protect an atom where the magnitude of $\boldsymbol{F}^\ast_k$ is small from having a large $L^r_F$. Benefiting from the relative force loss, small forces can be fitted more accurately.[1]

4.11.2. The fitting network#

The construction of the fitting net is given by section fitting_net

	"fitting_net" : {
	    "neuron":		[240, 240, 240],
	    "resnet_dt":	true,
	    "seed":		1
	},

neuron specifies the size of the fitting net. If two neighboring layers are of the same size, then a ResNet architecture is built between them.
If the option resnet_dt is set to true, then a timestep is used in the ResNet.
seed gives the random seed that is used to generate random numbers when initializing the model parameters.

4.11.3. Loss#

The loss function $L$ for training energy is given by

\[L = p_e L_e + p_f L_f + p_v L_v\]

where $L_e$, $L_f$, and $L_v$ denote the loss in energy, forces and virials, respectively. $p_e$, $p_f$, and $p_v$ give the prefactors of the energy, force and virial losses. The prefectors may not be a constant, rather it changes linearly with the learning rate. Taking the force prefactor for example, at training step $t$, it is given by

\[p_f(t) = p_f^0 \frac{ \alpha(t) }{ \alpha(0) } + p_f^\infty ( 1 - \frac{ \alpha(t) }{ \alpha(0) })\]

where $\alpha(t)$ denotes the learning rate at step $t$. $p_f^0$ and $p_f^\infty$ specifies the $p_f$ at the start of the training and the limit of $t \to \infty$ (set by start_pref_f and limit_pref_f, respectively), i.e.

\[pref_f(t) = start_pref_f * ( lr(t) / start_lr ) + limit_pref_f * ( 1 - lr(t) / start_lr )\]

The loss section in the input.json is

    "loss" : {
	"start_pref_e":	0.02,
	"limit_pref_e":	1,
	"start_pref_f":	1000,
	"limit_pref_f":	1,
	"start_pref_v":	0,
	"limit_pref_v":	0,
	"loss_func":	"mse",
	"intensive_ener_virial":	false
    }

The options start_pref_e, limit_pref_e, start_pref_f, limit_pref_f, start_pref_v and limit_pref_v determine the start and limit prefactors of energy, force and virial, respectively.

The loss_func option specifies the type of loss function to use. Two options are available:

"mse" (default): Mean Squared Error (L2 loss). This is the standard loss function that penalizes large errors more heavily.
"mae": Mean Absolute Error (L1 loss). This loss function is less sensitive to outliers and may be preferred when the training data contains occasional large errors.

When using loss_func="mse", the training will output rmse_e, rmse_f, rmse_v metrics (root mean square errors). When using loss_func="mae", the training will output mae_e, mae_f, mae_v metrics (mean absolute errors).

The intensive_ener_virial option (default is false) controls the normalization of the energy and virial loss terms when loss_func="mse". If set to true, these terms are normalized by $1/N^2$ (making them “intensive”), ensuring the loss scale remains consistent across different system sizes $N$. If false, the legacy $1/N$ normalization is used. This option is highly recommended for multi-task learning.

If one does not want to train with virial, then he/she may set the virial prefactors start_pref_v and limit_pref_v to 0.

4.11.3.1. Prefactor force loss with default atom preference#

Note

Supported backends: PyTorch , DP

When using the prefactor force loss (controlled by start_pref_pf and limit_pref_pf), the training data typically requires an atom_pref.npy file in each system directory to specify per-atom prefactors $q_k$. If atom_pref.npy is not provided, the use_default_pf option can be set to true to use a default atom preference of 1.0 for all atoms:

    "loss" : {
        "start_pref_e": 0.02,
        "limit_pref_e": 1,
        "start_pref_f": 1000,
        "limit_pref_f": 1,
        "start_pref_v": 0,
        "limit_pref_v": 0,
        "start_pref_pf": 1.0,
        "limit_pref_pf": 1.0,
        "use_default_pf": true
    }

This allows using the prefactor force loss without requiring atom_pref.npy files. When atom_pref.npy is provided in the training data, it will be used as-is regardless of the use_default_pf setting.

Note that use_default_pf is only effective for the PyTorch and DP (NumPy reference) backends. The TensorFlow and Paddle backends raise NotImplementedError when use_default_pf is set to true.