deepmd.pt.model.descriptor.sezm_nn.activation#

Activation helper modules for SeZM.

This module contains coefficient-space nonlinear operators, including GatedActivation and point-wise SwiGLU. Grid projectors and grid nets live in dedicated modules so coefficient-space and function-space logic remain separate.

Classes#

GatedActivation

Gated activation for SO(3) equivariant features with per-l independent gates.

SwiGLU

Point-wise SwiGLU on the last feature axis.

Module Contents#

class deepmd.pt.model.descriptor.sezm_nn.activation.GatedActivation(*, lmax: int, mmax: int | None = None, channels: int, n_focus: int = 1, dtype: torch.dtype, activation_function: str = 'silu', mlp_bias: bool = False, layout: str = 'nfdc', trainable: bool, seed: int | list[int] | None = None)[source]#

Bases: torch.nn.Module

Gated activation for SO(3) equivariant features with per-l independent gates.

Standard mode (gate=None in forward):
  • l=0: Uses the specified activation function

  • l>0: Each degree l has an independent gate derived from the l=0 scalar features.

    The gate for each l is expanded to all m components within that l-block.

GLU mode (gate provided in forward, e.g., from split linear output):
  • l=0: x0 * act(g0) (SwiGLU-style when act=silu, GeGLU when act=gelu, etc.)

  • l>0: Uses gate’s scalar (g0) to generate sigmoid gates for x’s vector components.

    This preserves SO(3) equivariance (scalar gates vector, not vector gates vector).

This module also supports the m-major reduced layout used inside SO(2) blocks. If mmax is provided, the coefficient axis is assumed to follow the truncated m-major order built by build_m_major_index(lmax, mmax); otherwise, it is assumed to be the full packed (l, m) layout with D=(lmax+1)^2.

Parameters:
lmax

Maximum spherical harmonic degree.

mmax

Maximum order (|m|) for the m-major reduced layout. If None, use the full packed layout with D=(lmax+1)^2.

channels

Number of channels per focus stream.

n_focus

Number of focus streams.

dtype

Internal compute dtype used by the gate projection and sigmoid path.

activation_function

Activation function for l=0 components (e.g., “silu”, “tanh”, “gelu”).

mlp_bias

Whether to use bias in the gate linear layer.

layout

Tensor layout convention. "nfdc" means input shape (N, F, D, C); "ndfc" means input shape (N, D, F, C).

trainable

Whether parameters are trainable.

seed

Random seed for weight initialization.

lmax[source]#
mmax = None[source]#
channels[source]#
n_focus = 1[source]#
dtype[source]#
device[source]#
precision[source]#
mlp_bias = False[source]#
layout = ''[source]#
scalar_act[source]#
forward(x: torch.Tensor, gate: torch.Tensor | None = None) torch.Tensor[source]#
Parameters:
x

Value features. Shape is (N, F, D, C) when layout='nfdc', or (N, D, F, C) when layout='ndfc'.

gate

Optional gate features with the same layout as x. When provided, enables GLU mode: - l=0: x0 * act(g0) (e.g., SwiGLU when act=silu) - l>0: sigmoid(Linear(g0)) gates x’s vector components When None (default), uses standard mode where gates are derived from x itself.

Returns:
torch.Tensor

Gated features with the same layout as x.

serialize() dict[str, Any][source]#
classmethod deserialize(data: dict[str, Any]) GatedActivation[source]#
class deepmd.pt.model.descriptor.sezm_nn.activation.SwiGLU[source]#

Bases: torch.nn.Module

Point-wise SwiGLU on the last feature axis.

forward(inputs: torch.Tensor) torch.Tensor[source]#