Shortcuts

mmagic.models.editors.stable_diffusion.vae

Module Contents

Classes

Downsample2D

A downsampling layer with an optional convolution.

Upsample2D

An upsampling layer with an optional convolution.

ResnetBlock2D

resnet block support down sample and up sample.

AttentionBlock

An attention block that allows spatial positions to attend to each

UNetMidBlock2D

middle block in unet.

DownEncoderBlock2D

Down encoder block in vae.

Encoder

construct encoder in vae.

UpDecoderBlock2D

construct up decoder block.

Decoder

construct decoder in vae.

DiagonalGaussianDistribution

Calculate diagonal gaussian distribution.

AutoencoderKL

Variational Autoencoder (VAE) model with KL loss

class mmagic.models.editors.stable_diffusion.vae.Downsample2D(channels, use_conv=False, out_channels=None, padding=1, name='conv')[source]

Bases: torch.nn.Module

A downsampling layer with an optional convolution.

Parameters
  • channels (int) – channels in the inputs and outputs.

  • use_conv (bool) – a bool determining if a convolution is applied.

  • out_channels (int) – output channels

  • padding (int) – padding num

forward(hidden_states)[source]

forward hidden states.

class mmagic.models.editors.stable_diffusion.vae.Upsample2D(channels, use_conv=False, use_conv_transpose=False, out_channels=None, name='conv')[source]

Bases: torch.nn.Module

An upsampling layer with an optional convolution.

Parameters
  • channels (int) – channels in the inputs and outputs.

  • use_conv (bool) – a bool determining if a convolution is applied.

  • use_conv_transpose (bool) – whether to use conv transpose.

  • out_channels (int) – output channels.

forward(hidden_states, output_size=None)[source]

forward with hidden states.

class mmagic.models.editors.stable_diffusion.vae.ResnetBlock2D(in_channels, out_channels=None, conv_shortcut=False, dropout=0.0, temb_channels=512, groups=32, groups_out=None, pre_norm=True, eps=1e-06, non_linearity='silu', time_embedding_norm='default', kernel=None, output_scale_factor=1.0, use_in_shortcut=None, up=False, down=False)[source]

Bases: torch.nn.Module

resnet block support down sample and up sample.

Parameters
  • in_channels (int) – input channels.

  • out_channels (int) – output channels.

  • conv_shortcut (bool) – whether to use conv shortcut.

  • dropout (float) – dropout rate.

  • temb_channels (int) – time embedding channels.

  • groups (int) – conv groups.

  • groups_out (int) – conv out groups.

  • pre_norm (bool) – whether to norm before conv. Todo: remove.

  • eps (float) – eps for groupnorm.

  • non_linearity (str) – non linearity type.

  • time_embedding_norm (str) – time embedding norm type.

  • output_scale_factor (float) – factor to scale input and output.

  • use_in_shortcut (bool) – whether to use conv in shortcut.

  • up (bool) – whether to upsample.

  • down (bool) – whether to downsample.

forward(input_tensor, temb)[source]

forward with hidden states and time embeddings.

class mmagic.models.editors.stable_diffusion.vae.AttentionBlock(channels: int, num_head_channels: Optional[int] = None, norm_num_groups: int = 32, rescale_output_factor: float = 1.0, eps: float = 1e-05)[source]

Bases: torch.nn.Module

An attention block that allows spatial positions to attend to each other. Originally ported from here, but adapted to the N-d case. https://github.com/hojonathanho/diffusion/blob/ 1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/models/unet.py#L66. Uses three q, k, v linear layers to compute attention.

Parameters
  • channels (int) – The number of channels in the input and output.

  • num_head_channels (int, optional) – The number of channels in each head. If None, then num_heads = 1.

  • norm_num_groups (int, optional, defaults to 32) – The number of groups to use for group norm.

  • rescale_output_factor (float, optional, defaults to 1.0) – The factor to rescale the output by.

  • eps (float, optional, defaults to 1e-5) – The epsilon value to use for group norm.

transpose_for_scores(projection: torch.Tensor) torch.Tensor[source]

transpose projection.

forward(hidden_states)[source]

forward hidden states.

class mmagic.models.editors.stable_diffusion.vae.UNetMidBlock2D(in_channels: int, temb_channels: int, dropout: float = 0.0, num_layers: int = 1, resnet_eps: float = 1e-06, resnet_time_scale_shift: str = 'default', resnet_act_fn: str = 'silu', resnet_groups: int = 32, resnet_pre_norm: bool = True, attn_num_head_channels=1, attention_type='default', output_scale_factor=1.0)[source]

Bases: torch.nn.Module

middle block in unet.

Parameters
  • in_channels (int) – input channels.

  • temb_channels (int) – time embedding channels.

  • dropout (float) – dropout rate, defaults to 0.0.

  • num_layers (int) – layer num.

  • resnet_eps (float) – resnet eps, defaults to 1e-6.

  • resnet_time_scale_shift (str) – time scale shift, defaults to ‘default’.

  • resnet_act_fn (str) – act function in resnet, defaults to ‘silu’.

  • resnet_groups (int) – conv groups in resnet, defaults to 32.

  • resnet_pre_norm (bool) – pre norm in resnet, defaults to True.

  • attn_num_head_channels (int) – attention head channels, defaults to 1.

  • attention_type (str) – attention type ,defaults to ‘default’.

  • output_scale_factor (float) – output scale factor, defaults to 1.0.

forward(hidden_states, temb=None, encoder_states=None)[source]

forward with hidden states, time embedding and encoder states.

class mmagic.models.editors.stable_diffusion.vae.DownEncoderBlock2D(in_channels: int, out_channels: int, dropout: float = 0.0, num_layers: int = 1, resnet_eps: float = 1e-06, resnet_time_scale_shift: str = 'default', resnet_act_fn: str = 'silu', resnet_groups: int = 32, resnet_pre_norm: bool = True, output_scale_factor=1.0, add_downsample=True, downsample_padding=1)[source]

Bases: torch.nn.Module

Down encoder block in vae.

Parameters
  • in_channels (int) – input channels.

  • out_channels (int) – output channels.

  • dropout (float) – dropout rate, defaults to 0.0.

  • num_layers (int) – layer nums, defaults to 1.

  • resnet_eps (float) – resnet eps, defaults to 1e-6.

  • resnet_time_scale_shift (str) – time scale shift in resnet, defaults to ‘default’.

  • resnet_act_fn (str) – act function in resnet, defaults to ‘silu’.

  • resnet_groups (int) – group num in resnet, defaults to 32.

  • resnet_pre_norm (bool) – whether to pre norm in resnet, defaults to True.

  • output_scale_factor (float) – output scale factor, defaults to 1.0.

  • add_downsample (bool) – whether to add downsample, defaults to True,

  • downsample_padding (int) – downsample padding num, defaults to 1.

forward(hidden_states)[source]

forward with hidden states.

class mmagic.models.editors.stable_diffusion.vae.Encoder(in_channels=3, out_channels=3, down_block_types=('DownEncoderBlock2D',), block_out_channels=(64,), layers_per_block=2, norm_num_groups=32, act_fn='silu', double_z=True)[source]

Bases: torch.nn.Module

construct encoder in vae.

forward(x)[source]

encoder forward.

class mmagic.models.editors.stable_diffusion.vae.UpDecoderBlock2D(in_channels: int, out_channels: int, dropout: float = 0.0, num_layers: int = 1, resnet_eps: float = 1e-06, resnet_time_scale_shift: str = 'default', resnet_act_fn: str = 'swish', resnet_groups: int = 32, resnet_pre_norm: bool = True, output_scale_factor=1.0, add_upsample=True)[source]

Bases: torch.nn.Module

construct up decoder block.

forward(hidden_states)[source]

forward hidden states.

class mmagic.models.editors.stable_diffusion.vae.Decoder(in_channels=3, out_channels=3, up_block_types=('UpDecoderBlock2D',), block_out_channels=(64,), layers_per_block=2, norm_num_groups=32, act_fn='silu')[source]

Bases: torch.nn.Module

construct decoder in vae.

forward(z)[source]

decoder forward.

class mmagic.models.editors.stable_diffusion.vae.DiagonalGaussianDistribution(parameters, deterministic=False)[source]

Bases: object

Calculate diagonal gaussian distribution.

sample(generator: Optional[torch.Generator] = None) torch.FloatTensor[source]

sample function.

kl(other=None)[source]

calculate kl divergence.

nll(sample, dims=[1, 2, 3])[source]

calculate negative log likelihood.

mode()[source]

return self.mean.

class mmagic.models.editors.stable_diffusion.vae.AutoencoderKL(in_channels: int = 3, out_channels: int = 3, down_block_types: Tuple[str] = ('DownEncoderBlock2D',), up_block_types: Tuple[str] = ('UpDecoderBlock2D',), block_out_channels: Tuple[int] = (64,), layers_per_block: int = 1, act_fn: str = 'silu', latent_channels: int = 4, norm_num_groups: int = 32, sample_size: int = 32)[source]

Bases: torch.nn.Module

Variational Autoencoder (VAE) model with KL loss from the paper Auto-Encoding Variational Bayes by Diederik P. Kingma and Max Welling.

Parameters
  • in_channels (int, optional, defaults to 3) – Number of channels in the input image.

  • out_channels (int, optional, defaults to 3) – Number of channels in the output.

  • (Tuple[str] (up_block_types) – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.

  • optional – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.

  • to (defaults) – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.

  • (Tuple[str] – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.

  • optional – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.

  • to – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.

  • (Tuple[int] (block_out_channels) – obj:(64,)): Tuple of block output channels.

  • optional – obj:(64,)): Tuple of block output channels.

  • to – obj:(64,)): Tuple of block output channels.

  • act_fn (str, optional, defaults to “silu”) – The activation function to use.

  • latent_channels (int, optional, defaults to 4) – Number of channels in the latent space.

  • sample_size (int, optional, defaults to 32) – sample size is now not supported.

property dtype[source]

The data type of the parameters of VAE.

encode(x: torch.FloatTensor, return_dict: bool = True) addict.Dict[source]

encode input.

decode(z: torch.FloatTensor, return_dict: bool = True) Union[addict.Dict, torch.FloatTensor][source]

decode z.

forward(sample: torch.FloatTensor, sample_posterior: bool = False, return_dict: bool = True, generator: Optional[torch.Generator] = None) Union[addict.Dict, torch.FloatTensor][source]
Parameters
  • sample (torch.FloatTensor) – Input sample.

  • sample_posterior (bool) – Whether to sample from the posterior. defaults to False.

  • return_dict (bool, optional, defaults to True) – Whether or not to return a [Dict] instead of a plain tuple.

Returns

decode results.

Return type

Dict(sample=dec)

Read the Docs v: latest
Versions
latest
stable
0.x
Downloads
pdf
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.