deepmd.pt.model.descriptor.sezm_nn.activation

deepmd.pt.model.descriptor.sezm_nn.activation#

Activation helper modules for SeZM.

This module contains coefficient-space nonlinear operators, including GatedActivation and point-wise SwiGLU. Grid projectors and grid nets live in dedicated modules so coefficient-space and function-space logic remain separate.

Classes#

`GatedActivation`	Gated activation for SO(3) equivariant features with per-l independent gates.
`SwiGLU`	Point-wise SwiGLU on the last feature axis.

Module Contents#

class deepmd.pt.model.descriptor.sezm_nn.activation.GatedActivation(*, lmax: int, mmax: int | None = None, channels: int, n_focus: int = 1, dtype: torch.dtype, activation_function: str = 'silu', mlp_bias: bool = False, layout: str = 'nfdc', trainable: bool, seed: int | list[int] | None = None)[source]#

Bases: torch.nn.Module

Gated activation for SO(3) equivariant features with per-l independent gates.

Standard mode (gate=None in forward):

l=0: Uses the specified activation function
l>0: Each degree l has an independent gate derived from the l=0 scalar features.
The gate for each l is expanded to all m components within that l-block.

GLU mode (gate provided in forward, e.g., from split linear output):

l=0: x0 * act(g0) (SwiGLU-style when act=silu, GeGLU when act=gelu, etc.)
l>0: Uses gate’s scalar (g0) to generate sigmoid gates for x’s vector components.
This preserves SO(3) equivariance (scalar gates vector, not vector gates vector).

This module also supports the m-major reduced layout used inside SO(2) blocks. If mmax is provided, the coefficient axis is assumed to follow the truncated m-major order built by build_m_major_index(lmax, mmax); otherwise, it is assumed to be the full packed (l, m) layout with D=(lmax+1)^2.

Parameters:

lmax: Maximum spherical harmonic degree.
mmax: Maximum order (|m|) for the m-major reduced layout. If None, use the full packed layout with D=(lmax+1)^2.
channels: Number of channels per focus stream.
n_focus: Number of focus streams.
dtype: Internal compute dtype used by the gate projection and sigmoid path.
activation_function: Activation function for l=0 components (e.g., “silu”, “tanh”, “gelu”).
mlp_bias: Whether to use bias in the gate linear layer.
layout: Tensor layout convention. "nfdc" means input shape (N, F, D, C); "ndfc" means input shape (N, D, F, C).
trainable: Whether parameters are trainable.
seed: Random seed for weight initialization.

lmax[source]#

mmax = None[source]#

channels[source]#

n_focus = 1[source]#

dtype[source]#

device[source]#

precision[source]#

mlp_bias = False[source]#

layout = ''[source]#

scalar_act[source]#

forward(x: torch.Tensor, gate: torch.Tensor | None = None) → torch.Tensor[source]#

Parameters:

x: Value features. Shape is (N, F, D, C) when layout='nfdc', or (N, D, F, C) when layout='ndfc'.
gate: Optional gate features with the same layout as x. When provided, enables GLU mode: - l=0: x0 * act(g0) (e.g., SwiGLU when act=silu) - l>0: sigmoid(Linear(g0)) gates x’s vector components When None (default), uses standard mode where gates are derived from x itself.

Returns:

torch.Tensor: Gated features with the same layout as x.

serialize() → dict[str, Any][source]#

classmethod deserialize(data: dict[str, Any]) → GatedActivation[source]#

class deepmd.pt.model.descriptor.sezm_nn.activation.SwiGLU[source]#

Bases: torch.nn.Module

Point-wise SwiGLU on the last feature axis.

forward(inputs: torch.Tensor) → torch.Tensor[source]#