Shortcuts

mmagic.models.editors.animatediff.resnet_3d

Module Contents

Classes

InflatedConv3d

An implementation of InflatedConv3d.

InflatedGroupNorm

Applies Group Normalization over a mini-batch of inputs as described in

Upsample3D

An 3D upsampling layer with an optional convolution.

Downsample3D

A 3D downsampling layer with an optional convolution.

ResnetBlock3D

3D resnet block support down sample and up sample.

Mish

Mish activation function.

class mmagic.models.editors.animatediff.resnet_3d.InflatedConv3d(in_channels: int, out_channels: int, kernel_size: torch.nn.common_types._size_2_t, stride: torch.nn.common_types._size_2_t = 1, padding: Union[str, torch.nn.common_types._size_2_t] = 0, dilation: torch.nn.common_types._size_2_t = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]

Bases: torch.nn.Conv2d

An implementation of InflatedConv3d.

forward(x)[source]

forward function.

class mmagic.models.editors.animatediff.resnet_3d.InflatedGroupNorm(num_groups: int, num_channels: int, eps: float = 1e-05, affine: bool = True, device=None, dtype=None)[source]

Bases: torch.nn.GroupNorm

Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization

\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

The input channels are separated into num_groups groups, each containing num_channels / num_groups channels. num_channels must be divisible by num_groups. The mean and standard-deviation are calculated separately over the each group. \(\gamma\) and \(\beta\) are learnable per-channel affine transform parameter vectors of size num_channels if affine is True. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).

This layer uses statistics computed from input data in both training and evaluation modes.

Parameters
  • num_groups (int) – number of groups to separate the channels into

  • num_channels (int) – number of channels expected in input

  • eps – a value added to the denominator for numerical stability. Default: 1e-5

  • affine – a boolean value that when set to True, this module has learnable per-channel affine parameters initialized to ones (for weights) and zeros (for biases). Default: True.

Shape:
  • Input: \((N, C, *)\) where \(C=\text{num\_channels}\)

  • Output: \((N, C, *)\) (same shape as input)

Examples:

>>> input = torch.randn(20, 6, 10, 10)
>>> # Separate 6 channels into 3 groups
>>> m = nn.GroupNorm(3, 6)
>>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm)
>>> m = nn.GroupNorm(6, 6)
>>> # Put all 6 channels into a single group (equivalent with LayerNorm)
>>> m = nn.GroupNorm(1, 6)
>>> # Activating the module
>>> output = m(input)
forward(x)[source]
class mmagic.models.editors.animatediff.resnet_3d.Upsample3D(channels, use_conv=False, use_conv_transpose=False, out_channels=None, name='conv')[source]

Bases: torch.nn.Module

An 3D upsampling layer with an optional convolution.

Parameters
  • channels (int) – channels in the inputs and outputs.

  • use_conv (bool) – a bool determining if a convolution is applied.

  • use_conv_transpose (bool) – whether to use conv transpose.

  • out_channels (int) – output channels.

forward(hidden_states, output_size=None)[source]

forward with hidden states.

class mmagic.models.editors.animatediff.resnet_3d.Downsample3D(channels, use_conv=False, out_channels=None, padding=1, name='conv')[source]

Bases: torch.nn.Module

A 3D downsampling layer with an optional convolution.

Parameters
  • channels (int) – channels in the inputs and outputs.

  • use_conv (bool) – a bool determining if a convolution is applied.

  • out_channels (int) – output channels

  • padding (int) – padding num

forward(hidden_states)[source]

forward with hidden states.

class mmagic.models.editors.animatediff.resnet_3d.ResnetBlock3D(*, in_channels, out_channels=None, conv_shortcut=False, dropout=0.0, temb_channels=512, groups=32, groups_out=None, pre_norm=True, eps=1e-06, non_linearity='swish', time_embedding_norm='default', output_scale_factor=1.0, use_in_shortcut=None, use_inflated_groupnorm=None)[source]

Bases: torch.nn.Module

3D resnet block support down sample and up sample.

Parameters
  • in_channels (int) – input channels.

  • out_channels (int) – output channels.

  • conv_shortcut (bool) – whether to use conv shortcut.

  • dropout (float) – dropout rate.

  • temb_channels (int) – time embedding channels.

  • groups (int) – conv groups.

  • groups_out (int) – conv out groups.

  • pre_norm (bool) – whether to norm before conv. Todo: remove.

  • eps (float) – eps for groupnorm.

  • non_linearity (str) – non linearity type.

  • time_embedding_norm (str) – time embedding norm type.

  • output_scale_factor (float) – factor to scale input and output.

  • use_in_shortcut (bool) – whether to use conv in shortcut.

forward(input_tensor, temb)[source]

forward with hidden states and time embeddings.

class mmagic.models.editors.animatediff.resnet_3d.Mish[source]

Bases: torch.nn.Module

Mish activation function.

forward(hidden_states)[source]

forward with hidden states.

Read the Docs v: latest
Versions
latest
stable
0.x
Downloads
pdf
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.