mmagic.models.editors.animatediff.resnet_3d
¶
Module Contents¶
Classes¶
An implementation of InflatedConv3d. |
|
Applies Group Normalization over a mini-batch of inputs as described in |
|
An 3D upsampling layer with an optional convolution. |
|
A 3D downsampling layer with an optional convolution. |
|
3D resnet block support down sample and up sample. |
|
Mish activation function. |
- class mmagic.models.editors.animatediff.resnet_3d.InflatedConv3d(in_channels: int, out_channels: int, kernel_size: torch.nn.common_types._size_2_t, stride: torch.nn.common_types._size_2_t = 1, padding: Union[str, torch.nn.common_types._size_2_t] = 0, dilation: torch.nn.common_types._size_2_t = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]¶
Bases:
torch.nn.Conv2d
An implementation of InflatedConv3d.
- class mmagic.models.editors.animatediff.resnet_3d.InflatedGroupNorm(num_groups: int, num_channels: int, eps: float = 1e-05, affine: bool = True, device=None, dtype=None)[source]¶
Bases:
torch.nn.GroupNorm
Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization
\[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]The input channels are separated into
num_groups
groups, each containingnum_channels / num_groups
channels.num_channels
must be divisible bynum_groups
. The mean and standard-deviation are calculated separately over the each group. \(\gamma\) and \(\beta\) are learnable per-channel affine transform parameter vectors of sizenum_channels
ifaffine
isTrue
. The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False).This layer uses statistics computed from input data in both training and evaluation modes.
- Parameters
num_groups (int) – number of groups to separate the channels into
num_channels (int) – number of channels expected in input
eps – a value added to the denominator for numerical stability. Default: 1e-5
affine – a boolean value that when set to
True
, this module has learnable per-channel affine parameters initialized to ones (for weights) and zeros (for biases). Default:True
.
- Shape:
Input: \((N, C, *)\) where \(C=\text{num\_channels}\)
Output: \((N, C, *)\) (same shape as input)
Examples:
>>> input = torch.randn(20, 6, 10, 10) >>> # Separate 6 channels into 3 groups >>> m = nn.GroupNorm(3, 6) >>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm) >>> m = nn.GroupNorm(6, 6) >>> # Put all 6 channels into a single group (equivalent with LayerNorm) >>> m = nn.GroupNorm(1, 6) >>> # Activating the module >>> output = m(input)
- class mmagic.models.editors.animatediff.resnet_3d.Upsample3D(channels, use_conv=False, use_conv_transpose=False, out_channels=None, name='conv')[source]¶
Bases:
torch.nn.Module
An 3D upsampling layer with an optional convolution.
- Parameters
channels (int) – channels in the inputs and outputs.
use_conv (bool) – a bool determining if a convolution is applied.
use_conv_transpose (bool) – whether to use conv transpose.
out_channels (int) – output channels.
- class mmagic.models.editors.animatediff.resnet_3d.Downsample3D(channels, use_conv=False, out_channels=None, padding=1, name='conv')[source]¶
Bases:
torch.nn.Module
A 3D downsampling layer with an optional convolution.
- Parameters
channels (int) – channels in the inputs and outputs.
use_conv (bool) – a bool determining if a convolution is applied.
out_channels (int) – output channels
padding (int) – padding num
- class mmagic.models.editors.animatediff.resnet_3d.ResnetBlock3D(*, in_channels, out_channels=None, conv_shortcut=False, dropout=0.0, temb_channels=512, groups=32, groups_out=None, pre_norm=True, eps=1e-06, non_linearity='swish', time_embedding_norm='default', output_scale_factor=1.0, use_in_shortcut=None, use_inflated_groupnorm=None)[source]¶
Bases:
torch.nn.Module
3D resnet block support down sample and up sample.
- Parameters
in_channels (int) – input channels.
out_channels (int) – output channels.
conv_shortcut (bool) – whether to use conv shortcut.
dropout (float) – dropout rate.
temb_channels (int) – time embedding channels.
groups (int) – conv groups.
groups_out (int) – conv out groups.
pre_norm (bool) – whether to norm before conv. Todo: remove.
eps (float) – eps for groupnorm.
non_linearity (str) – non linearity type.
time_embedding_norm (str) – time embedding norm type.
output_scale_factor (float) – factor to scale input and output.
use_in_shortcut (bool) – whether to use conv in shortcut.