Type embedding approach

4.17. Type embedding approach #

Note

Supported backends: TensorFlow , PyTorch , JAX , DP

We generate specific a type embedding vector for each atom type so that we can share one descriptor embedding net and one fitting net in total, which decline training complexity largely.

The training input script is similar to that of se_e2_a, but different by adding the type_embedding section.

4.17.1. Theory#

Usually, when the type embedding approach is not enabled, for a system with multiple chemical species (\(|\{\alpha_i\}| > 1\)), parameters of the embedding network \(\mathcal{N}_{e,\{2,3\}}\) are as follows chemical-species-wise:

\[ (\mathcal{G}^i)_j = \mathcal{N}^{\alpha_i, \alpha_j}_{e,2}(s(r_{ij})) \quad \mathrm{or}\quad (\mathcal{G}^i)_j = \mathcal{N}^{ \alpha_j}_{e,2}(s(r_{ij})),\]

\[ (\mathcal{G}^i)_{jk} =\mathcal{N}^{\alpha_j, \alpha_k}_{e,3}((\theta_i)_{jk}).\]

Thus, there will be \(N_t^2\) or \(N_t\) embedding networks where \(N_t\) is the number of chemical species. To improve the performance of matrix operations, \(n(i)\) is divided into blocks of different chemical species. Each matrix with a dimension of \(N_c\) is divided into corresponding blocks, and each block is padded to \(N_c^{\alpha_j}\) separately. The limitation of this approach is that when there are large numbers of chemical species, the number of embedding networks will increase, requiring large memory and decreasing computing efficiency.

Similar to the embedding networks, if the type embedding approach is not used, the fitting network parameters are chemical-species-wise, and there are \(N_t\) sets of fitting network parameters. For performance, atoms are sorted by their chemical species \(\alpha_i\) in advance. Take an example, the atomic energy \(E_i\) is represented as follows:

\[E_i=\mathcal{F}_0^{\alpha_i}(\mathcal{D}^i).\]

To reduce the number of NN parameters and improve computing efficiency when there are large numbers of chemical species, the type embedding \(\mathcal{A}\) is introduced, represented as a NN function \(\mathcal{N}_t\) of the atomic type \(\alpha\):

\[ \mathcal{A}^i = \mathcal{N}_t\big( \text{one hot}(\alpha_i) \big),\]

where \(\alpha_i\) is converted to a one-hot vector representing the chemical species before feeding to the NN. The type embeddings of central and neighboring atoms \(\mathcal{A}^i\) and \(\mathcal{A}^j\) are added as an extra input of the embedding network \(\mathcal{N}_{e,\{2,3\}}\):

\[ (\mathcal{G}^i)_j = \mathcal{N}_{e,2}(\{s(r_{ij}), \mathcal{A}^i, \mathcal{A}^j\}) \quad \mathrm{or}\quad (\mathcal{G}^i)_j = \mathcal{N}_{e,2}(\{s(r_{ij}), \mathcal{A}^j\}) ,\]

\[ (\mathcal{G}^i)_{jk} =\mathcal{N}_{e,3}(\{(\theta_i)_{jk}, \mathcal{A}^j, \mathcal{A}^k\}).\]

In fitting networks, the type embedding is inserted into the input of the fitting networks:

\[E_i=\mathcal{F}_0(\{\mathcal{D}^i, \mathcal{A}^i\}).\]

In this way, all chemical species share the same network parameters through the type embedding.[1]

4.17.2. Instructions for TensorFlow backend #

In the TensorFlow backend, the type embedding is at the model level. The model defines how the model is constructed, adding a section of type embedding net:

    "model": {
	"type_map":	["O", "H"],
	"type_embedding":{
			...
	},
	"descriptor" :{
            ...
	},
	"fitting_net" : {
            ...
	}
    }

The model will automatically apply the type embedding approach and generate type embedding vectors. If the type embedding vector is detected, the descriptor and fitting net would take it as a part of the input.

The construction of type embedding net is given by type_embedding. An example of type_embedding is provided as follows

	"type_embedding":{
	    "neuron":		[2, 4, 8],
	    "resnet_dt":	false,
	    "seed":		1
	}

The neuron specifies the size of the type embedding net. From left to right the members denote the sizes of each hidden layer from the input end to the output end, respectively. It takes a one-hot vector as input and output dimension equals to the last dimension of the neuron list. If the outer layer is twice the size of the inner layer, then the inner layer is copied and concatenated, then a ResNet architecture is built between them.
If the option resnet_dt is set to true, then a timestep is used in the ResNet.
seed gives the random seed that is used to generate random numbers when initializing the model parameters.

A complete training input script of this example can be found in the directory.

$deepmd_source_dir/examples/water/se_e2_a_tebd/input.json

See here for further explanation of type embedding.

See documentation for each descriptor for details.

4.17.3. Instructions for other backends#

In other backends, the type embedding is within the descriptor itself.