mmagic.models.editors.animatediff.unet_3d
¶
Module Contents¶
Classes¶
Output of UNet3DCondtion. |
|
Attributes¶
- class mmagic.models.editors.animatediff.unet_3d.UNet3DConditionOutput[source]¶
Bases:
diffusers.utils.BaseOutput
Output of UNet3DCondtion.
- class mmagic.models.editors.animatediff.unet_3d.UNet3DConditionMotionModel(sample_size: Optional[int] = None, in_channels: int = 4, out_channels: int = 4, center_input_sample: bool = False, flip_sin_to_cos: bool = True, freq_shift: int = 0, down_block_types: Tuple[str] = ('CrossAttnDownBlock3D', 'CrossAttnDownBlock3D', 'CrossAttnDownBlock3D', 'DownBlock3D'), mid_block_type: str = 'UNetMidBlock3DCrossAttn', up_block_types: Tuple[str] = ('UpBlock3D', 'CrossAttnUpBlock3D', 'CrossAttnUpBlock3D', 'CrossAttnUpBlock3D'), only_cross_attention: Union[bool, Tuple[bool]] = False, block_out_channels: Tuple[int] = (320, 640, 1280, 1280), layers_per_block: int = 2, downsample_padding: int = 1, mid_block_scale_factor: float = 1, act_fn: str = 'silu', norm_num_groups: int = 32, norm_eps: float = 1e-05, cross_attention_dim: int = 768, attention_head_dim: Union[int, Tuple[int]] = 8, dual_cross_attention: bool = False, use_linear_projection: bool = False, class_embed_type: Optional[str] = None, num_class_embeds: Optional[int] = None, upcast_attention: bool = False, resnet_time_scale_shift: str = 'default', use_inflated_groupnorm=False, use_motion_module=False, motion_module_resolutions=(1, 2, 4, 8), motion_module_mid_block=False, motion_module_decoder_only=False, motion_module_type=None, motion_module_kwargs={}, unet_use_cross_frame_attention=None, unet_use_temporal_attention=None, subfolder=None, from_pretrained=None, unet_addtion_kwargs=None)[source]¶
Bases:
diffusers.models.modeling_utils.ModelMixin
,diffusers.configuration_utils.ConfigMixin
- init_weights(subfolder=None, from_pretrained=None)[source]¶
Init weights for models.
We just use the initialization method proposed in the original paper.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
- set_attention_slice(slice_size)[source]¶
Enable sliced attention computation.
When this option is enabled, the attention module will split the input tensor in slices, to compute attention in several steps. This is useful to save some memory in exchange for a small speed decrease.
- Parameters
slice_size (str or int or `list(int) – defaults to “auto”): When “auto”, halves the input to the attention heads, so attention will be computed in two steps. If “max”, maximum amount of memory will be saved by running only one slice at a time. If a number is provided, uses as many slices as attention_head_dim // slice_size. In this case, attention_head_dim’ must be a multiple of `slice_size.
- forward(sample: torch.FloatTensor, timestep: Union[torch.Tensor, float, int], encoder_hidden_states: torch.Tensor, class_labels: Optional[torch.Tensor] = None, attention_mask: Optional[torch.Tensor] = None, return_dict: bool = True) Union[UNet3DConditionOutput, Tuple] [source]¶
- Parameters
sample (torch.FloatTensor) – (batch, channel, height, width)
tensor (noisy inputs) –
timestep (torch.FloatTensor or float or int) –
timesteps ((batch)) –
encoder_hidden_states (torch.FloatTensor) –
(batch –
sequence_length –
states (feature_dim) encoder hidden) –
return_dict (bool, optional, defaults to True) – Whether or not to return a [UNet3DConditionOutput] instead of a plain tuple.
- Returns
[UNet3DConditionOutput] if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.
- Return type
[UNet3DConditionOutput] or tuple