`mmagic.models.editors.stable_diffusion.vae`¶

Module Contents¶

Classes¶

`Downsample2D`	A downsampling layer with an optional convolution.
`Upsample2D`	An upsampling layer with an optional convolution.
`ResnetBlock2D`	resnet block support down sample and up sample.
`AttentionBlock`	An attention block that allows spatial positions to attend to each
`UNetMidBlock2D`	middle block in unet.
`DownEncoderBlock2D`	Down encoder block in vae.
`Encoder`	construct encoder in vae.
`UpDecoderBlock2D`	construct up decoder block.
`Decoder`	construct decoder in vae.
`DiagonalGaussianDistribution`	Calculate diagonal gaussian distribution.
`AutoencoderKL`	Variational Autoencoder (VAE) model with KL loss

class mmagic.models.editors.stable_diffusion.vae.Downsample2D(channels, use_conv=False, out_channels=None, padding=1, name='conv')[source]¶

Bases: torch.nn.Module

A downsampling layer with an optional convolution.

Parameters

channels (int) – channels in the inputs and outputs.
use_conv (bool) – a bool determining if a convolution is applied.
out_channels (int) – output channels
padding (int) – padding num

forward(hidden_states)[source]¶: forward hidden states.

class mmagic.models.editors.stable_diffusion.vae.Upsample2D(channels, use_conv=False, use_conv_transpose=False, out_channels=None, name='conv')[source]¶

Bases: torch.nn.Module

An upsampling layer with an optional convolution.

Parameters

channels (int) – channels in the inputs and outputs.
use_conv (bool) – a bool determining if a convolution is applied.
use_conv_transpose (bool) – whether to use conv transpose.
out_channels (int) – output channels.

forward(hidden_states, output_size=None)[source]¶: forward with hidden states.

class mmagic.models.editors.stable_diffusion.vae.ResnetBlock2D(in_channels, out_channels=None, conv_shortcut=False, dropout=0.0, temb_channels=512, groups=32, groups_out=None, pre_norm=True, eps=1e-06, non_linearity='silu', time_embedding_norm='default', kernel=None, output_scale_factor=1.0, use_in_shortcut=None, up=False, down=False)[source]¶

Bases: torch.nn.Module

resnet block support down sample and up sample.

Parameters

in_channels (int) – input channels.
out_channels (int) – output channels.
conv_shortcut (bool) – whether to use conv shortcut.
dropout (float) – dropout rate.
temb_channels (int) – time embedding channels.
groups (int) – conv groups.
groups_out (int) – conv out groups.
pre_norm (bool) – whether to norm before conv. Todo: remove.
eps (float) – eps for groupnorm.
non_linearity (str) – non linearity type.
time_embedding_norm (str) – time embedding norm type.
output_scale_factor (float) – factor to scale input and output.
use_in_shortcut (bool) – whether to use conv in shortcut.
up (bool) – whether to upsample.
down (bool) – whether to downsample.

forward(input_tensor, temb)[source]¶: forward with hidden states and time embeddings.

class mmagic.models.editors.stable_diffusion.vae.AttentionBlock(channels: int, num_head_channels: Optional[int] = None, norm_num_groups: int = 32, rescale_output_factor: float = 1.0, eps: float = 1e-05)[source]¶

Bases: torch.nn.Module

An attention block that allows spatial positions to attend to each other. Originally ported from here, but adapted to the N-d case. https://github.com/hojonathanho/diffusion/blob/ 1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/models/unet.py#L66. Uses three q, k, v linear layers to compute attention.

Parameters

channels (int) – The number of channels in the input and output.
num_head_channels (int, optional) – The number of channels in each head. If None, then num_heads = 1.
norm_num_groups (int, optional, defaults to 32) – The number of groups to use for group norm.
rescale_output_factor (float, optional, defaults to 1.0) – The factor to rescale the output by.
eps (float, optional, defaults to 1e-5) – The epsilon value to use for group norm.

transpose_for_scores(projection: torch.Tensor) → torch.Tensor[source]¶: transpose projection.

forward(hidden_states)[source]¶: forward hidden states.

class mmagic.models.editors.stable_diffusion.vae.UNetMidBlock2D(in_channels: int, temb_channels: int, dropout: float = 0.0, num_layers: int = 1, resnet_eps: float = 1e-06, resnet_time_scale_shift: str = 'default', resnet_act_fn: str = 'silu', resnet_groups: int = 32, resnet_pre_norm: bool = True, attn_num_head_channels=1, attention_type='default', output_scale_factor=1.0)[source]¶

Bases: torch.nn.Module

middle block in unet.

Parameters

in_channels (int) – input channels.
temb_channels (int) – time embedding channels.
dropout (float) – dropout rate, defaults to 0.0.
num_layers (int) – layer num.
resnet_eps (float) – resnet eps, defaults to 1e-6.
resnet_time_scale_shift (str) – time scale shift, defaults to ‘default’.
resnet_act_fn (str) – act function in resnet, defaults to ‘silu’.
resnet_groups (int) – conv groups in resnet, defaults to 32.
resnet_pre_norm (bool) – pre norm in resnet, defaults to True.
attn_num_head_channels (int) – attention head channels, defaults to 1.
attention_type (str) – attention type ,defaults to ‘default’.
output_scale_factor (float) – output scale factor, defaults to 1.0.

forward(hidden_states, temb=None, encoder_states=None)[source]¶: forward with hidden states, time embedding and encoder states.

class mmagic.models.editors.stable_diffusion.vae.DownEncoderBlock2D(in_channels: int, out_channels: int, dropout: float = 0.0, num_layers: int = 1, resnet_eps: float = 1e-06, resnet_time_scale_shift: str = 'default', resnet_act_fn: str = 'silu', resnet_groups: int = 32, resnet_pre_norm: bool = True, output_scale_factor=1.0, add_downsample=True, downsample_padding=1)[source]¶

Bases: torch.nn.Module

Down encoder block in vae.

Parameters

in_channels (int) – input channels.
out_channels (int) – output channels.
dropout (float) – dropout rate, defaults to 0.0.
num_layers (int) – layer nums, defaults to 1.
resnet_eps (float) – resnet eps, defaults to 1e-6.
resnet_time_scale_shift (str) – time scale shift in resnet, defaults to ‘default’.
resnet_act_fn (str) – act function in resnet, defaults to ‘silu’.
resnet_groups (int) – group num in resnet, defaults to 32.
resnet_pre_norm (bool) – whether to pre norm in resnet, defaults to True.
output_scale_factor (float) – output scale factor, defaults to 1.0.
add_downsample (bool) – whether to add downsample, defaults to True,
downsample_padding (int) – downsample padding num, defaults to 1.

forward(hidden_states)[source]¶: forward with hidden states.

class mmagic.models.editors.stable_diffusion.vae.Encoder(in_channels=3, out_channels=3, down_block_types=('DownEncoderBlock2D',), block_out_channels=(64,), layers_per_block=2, norm_num_groups=32, act_fn='silu', double_z=True)[source]¶

Bases: torch.nn.Module

construct encoder in vae.

forward(x)[source]¶: encoder forward.

class mmagic.models.editors.stable_diffusion.vae.UpDecoderBlock2D(in_channels: int, out_channels: int, dropout: float = 0.0, num_layers: int = 1, resnet_eps: float = 1e-06, resnet_time_scale_shift: str = 'default', resnet_act_fn: str = 'swish', resnet_groups: int = 32, resnet_pre_norm: bool = True, output_scale_factor=1.0, add_upsample=True)[source]¶

Bases: torch.nn.Module

construct up decoder block.

forward(hidden_states)[source]¶: forward hidden states.

class mmagic.models.editors.stable_diffusion.vae.Decoder(in_channels=3, out_channels=3, up_block_types=('UpDecoderBlock2D',), block_out_channels=(64,), layers_per_block=2, norm_num_groups=32, act_fn='silu')[source]¶

Bases: torch.nn.Module

construct decoder in vae.

forward(z)[source]¶: decoder forward.

class mmagic.models.editors.stable_diffusion.vae.DiagonalGaussianDistribution(parameters, deterministic=False)[source]¶

Bases: object

Calculate diagonal gaussian distribution.

sample(generator: Optional[torch.Generator] = None) → torch.FloatTensor[source]¶: sample function.

kl(other=None)[source]¶: calculate kl divergence.

nll(sample, dims=[1, 2, 3])[source]¶: calculate negative log likelihood.

mode()[source]¶: return self.mean.

class mmagic.models.editors.stable_diffusion.vae.AutoencoderKL(in_channels: int = 3, out_channels: int = 3, down_block_types: Tuple[str] = ('DownEncoderBlock2D',), up_block_types: Tuple[str] = ('UpDecoderBlock2D',), block_out_channels: Tuple[int] = (64,), layers_per_block: int = 1, act_fn: str = 'silu', latent_channels: int = 4, norm_num_groups: int = 32, sample_size: int = 32)[source]¶

Bases: torch.nn.Module

Variational Autoencoder (VAE) model with KL loss from the paper Auto-Encoding Variational Bayes by Diederik P. Kingma and Max Welling.

Parameters

in_channels (int, optional, defaults to 3) – Number of channels in the input image.
out_channels (int, optional, defaults to 3) – Number of channels in the output.
(Tuple[str] (up_block_types) – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.
optional – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.
to (defaults) – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.
(Tuple[str] – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.
optional – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.
to – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.
(Tuple[int] (block_out_channels) – obj:(64,)): Tuple of block output channels.
optional – obj:(64,)): Tuple of block output channels.
to – obj:(64,)): Tuple of block output channels.
act_fn (str, optional, defaults to “silu”) – The activation function to use.
latent_channels (int, optional, defaults to 4) – Number of channels in the latent space.
sample_size (int, optional, defaults to 32) – sample size is now not supported.

property dtype[source]¶: The data type of the parameters of VAE.

encode(x: torch.FloatTensor, return_dict: bool = True) → addict.Dict[source]¶: encode input.

decode(z: torch.FloatTensor, return_dict: bool = True) → Union[addict.Dict, torch.FloatTensor][source]¶: decode z.

forward(sample: torch.FloatTensor, sample_posterior: bool = False, return_dict: bool = True, generator: Optional[torch.Generator] = None) → Union[addict.Dict, torch.FloatTensor][source]¶

Parameters

sample (torch.FloatTensor) – Input sample.
sample_posterior (bool) – Whether to sample from the posterior. defaults to False.
return_dict (bool, optional, defaults to True) – Whether or not to return a [Dict] instead of a plain tuple.

Returns

decode results.

Return type

Dict(sample=dec)

mmagic.models.editors.stable_diffusion.vae¶

Module Contents¶

Classes¶

`mmagic.models.editors.stable_diffusion.vae`¶