deepmd.pt.model.descriptor.sezm_nn.activation#
Activation helper modules for SeZM.
This module contains coefficient-space nonlinear operators, including GatedActivation and point-wise SwiGLU. Grid projectors and grid nets live in dedicated modules so coefficient-space and function-space logic remain separate.
Classes#
Gated activation for SO(3) equivariant features with per-l independent gates. | |
Point-wise SwiGLU on the last feature axis. |
Module Contents#
- class deepmd.pt.model.descriptor.sezm_nn.activation.GatedActivation(*, lmax: int, mmax: int | None = None, channels: int, n_focus: int = 1, dtype: torch.dtype, activation_function: str = 'silu', mlp_bias: bool = False, layout: str = 'nfdc', trainable: bool, seed: int | list[int] | None = None)[source]#
Bases:
torch.nn.ModuleGated activation for SO(3) equivariant features with per-l independent gates.
- Standard mode (gate=None in forward):
l=0: Uses the specified activation function
- l>0: Each degree l has an independent gate derived from the l=0 scalar features.
The gate for each l is expanded to all m components within that l-block.
- GLU mode (gate provided in forward, e.g., from split linear output):
l=0: x0 * act(g0) (SwiGLU-style when act=silu, GeGLU when act=gelu, etc.)
- l>0: Uses gate’s scalar (g0) to generate sigmoid gates for x’s vector components.
This preserves SO(3) equivariance (scalar gates vector, not vector gates vector).
This module also supports the m-major reduced layout used inside SO(2) blocks. If mmax is provided, the coefficient axis is assumed to follow the truncated m-major order built by build_m_major_index(lmax, mmax); otherwise, it is assumed to be the full packed (l, m) layout with D=(lmax+1)^2.
- Parameters:
- lmax
Maximum spherical harmonic degree.
- mmax
Maximum order (|m|) for the m-major reduced layout. If None, use the full packed layout with D=(lmax+1)^2.
- channels
Number of channels per focus stream.
- n_focus
Number of focus streams.
- dtype
Internal compute dtype used by the gate projection and sigmoid path.
- activation_function
Activation function for l=0 components (e.g., “silu”, “tanh”, “gelu”).
- mlp_bias
Whether to use bias in the gate linear layer.
- layout
Tensor layout convention.
"nfdc"means input shape (N, F, D, C);"ndfc"means input shape (N, D, F, C).- trainable
Whether parameters are trainable.
- seed
Random seed for weight initialization.
- forward(x: torch.Tensor, gate: torch.Tensor | None = None) torch.Tensor[source]#
- Parameters:
- x
Value features. Shape is (N, F, D, C) when
layout='nfdc', or (N, D, F, C) whenlayout='ndfc'.- gate
Optional gate features with the same layout as
x. When provided, enables GLU mode: - l=0: x0 * act(g0) (e.g., SwiGLU when act=silu) - l>0: sigmoid(Linear(g0)) gates x’s vector components When None (default), uses standard mode where gates are derived from x itself.
- Returns:
torch.TensorGated features with the same layout as
x.
- class deepmd.pt.model.descriptor.sezm_nn.activation.SwiGLU[source]#
Bases:
torch.nn.ModulePoint-wise SwiGLU on the last feature axis.
- forward(inputs: torch.Tensor) torch.Tensor[source]#