`mmagic.models.editors.animatediff.resnet_3d`¶

Module Contents¶

Classes¶

`InflatedConv3d`	An implementation of InflatedConv3d.
`InflatedGroupNorm`	Applies Group Normalization over a mini-batch of inputs as described in
`Upsample3D`	An 3D upsampling layer with an optional convolution.
`Downsample3D`	A 3D downsampling layer with an optional convolution.
`ResnetBlock3D`	3D resnet block support down sample and up sample.
`Mish`	Mish activation function.

class mmagic.models.editors.animatediff.resnet_3d.InflatedConv3d(in_channels: int, out_channels: int, kernel_size: torch.nn.common_types._size_2_t, stride: torch.nn.common_types._size_2_t = 1, padding: Union[str, torch.nn.common_types._size_2_t] = 0, dilation: torch.nn.common_types._size_2_t = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]¶

Bases: torch.nn.Conv2d

An implementation of InflatedConv3d.

forward(x)[source]¶: forward function.

class mmagic.models.editors.animatediff.resnet_3d.InflatedGroupNorm(num_groups: int, num_channels: int, eps: float = 1e-05, affine: bool = True, device=None, dtype=None)[source]¶

Bases: torch.nn.GroupNorm

Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization

\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

The input channels are separated into num_groups groups, each containing num_channels / num_groups channels. num_channels must be divisible by num_groups. The mean and standard-deviation are calculated separately over the each group. \(\gamma\) and \(\beta\) are learnable per-channel affine transform parameter vectors of size num_channels if affine is True. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).

This layer uses statistics computed from input data in both training and evaluation modes.

Parameters

num_groups (int) – number of groups to separate the channels into
num_channels (int) – number of channels expected in input
eps – a value added to the denominator for numerical stability. Default: 1e-5
affine – a boolean value that when set to True, this module has learnable per-channel affine parameters initialized to ones (for weights) and zeros (for biases). Default: True.

Shape:

Input: \((N, C, *)\) where \(C=\text{num\_channels}\)
Output: \((N, C, *)\) (same shape as input)

Examples:

>>> input = torch.randn(20, 6, 10, 10)
>>> # Separate 6 channels into 3 groups
>>> m = nn.GroupNorm(3, 6)
>>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm)
>>> m = nn.GroupNorm(6, 6)
>>> # Put all 6 channels into a single group (equivalent with LayerNorm)
>>> m = nn.GroupNorm(1, 6)
>>> # Activating the module
>>> output = m(input)

forward(x)[source]¶

class mmagic.models.editors.animatediff.resnet_3d.Upsample3D(channels, use_conv=False, use_conv_transpose=False, out_channels=None, name='conv')[source]¶

Bases: torch.nn.Module

An 3D upsampling layer with an optional convolution.

Parameters

channels (int) – channels in the inputs and outputs.
use_conv (bool) – a bool determining if a convolution is applied.
use_conv_transpose (bool) – whether to use conv transpose.
out_channels (int) – output channels.

forward(hidden_states, output_size=None)[source]¶: forward with hidden states.

class mmagic.models.editors.animatediff.resnet_3d.Downsample3D(channels, use_conv=False, out_channels=None, padding=1, name='conv')[source]¶

Bases: torch.nn.Module

A 3D downsampling layer with an optional convolution.

Parameters

channels (int) – channels in the inputs and outputs.
use_conv (bool) – a bool determining if a convolution is applied.
out_channels (int) – output channels
padding (int) – padding num

forward(hidden_states)[source]¶: forward with hidden states.

class mmagic.models.editors.animatediff.resnet_3d.ResnetBlock3D(*, in_channels, out_channels=None, conv_shortcut=False, dropout=0.0, temb_channels=512, groups=32, groups_out=None, pre_norm=True, eps=1e-06, non_linearity='swish', time_embedding_norm='default', output_scale_factor=1.0, use_in_shortcut=None, use_inflated_groupnorm=None)[source]¶

Bases: torch.nn.Module

3D resnet block support down sample and up sample.

Parameters

in_channels (int) – input channels.
out_channels (int) – output channels.
conv_shortcut (bool) – whether to use conv shortcut.
dropout (float) – dropout rate.
temb_channels (int) – time embedding channels.
groups (int) – conv groups.
groups_out (int) – conv out groups.
pre_norm (bool) – whether to norm before conv. Todo: remove.
eps (float) – eps for groupnorm.
non_linearity (str) – non linearity type.
time_embedding_norm (str) – time embedding norm type.
output_scale_factor (float) – factor to scale input and output.
use_in_shortcut (bool) – whether to use conv in shortcut.

forward(input_tensor, temb)[source]¶: forward with hidden states and time embeddings.

class mmagic.models.editors.animatediff.resnet_3d.Mish[source]¶

Bases: torch.nn.Module

Mish activation function.

forward(hidden_states)[source]¶: forward with hidden states.

mmagic.models.editors.animatediff.resnet_3d¶

Module Contents¶

Classes¶

`mmagic.models.editors.animatediff.resnet_3d`¶