mmagic.models.editors.stable_diffusion.vae
¶
Module Contents¶
Classes¶
A downsampling layer with an optional convolution. |
|
An upsampling layer with an optional convolution. |
|
resnet block support down sample and up sample. |
|
An attention block that allows spatial positions to attend to each |
|
middle block in unet. |
|
Down encoder block in vae. |
|
construct encoder in vae. |
|
construct up decoder block. |
|
construct decoder in vae. |
|
Calculate diagonal gaussian distribution. |
|
Variational Autoencoder (VAE) model with KL loss |
- class mmagic.models.editors.stable_diffusion.vae.Downsample2D(channels, use_conv=False, out_channels=None, padding=1, name='conv')[source]¶
Bases:
torch.nn.Module
A downsampling layer with an optional convolution.
- Parameters
channels (int) – channels in the inputs and outputs.
use_conv (bool) – a bool determining if a convolution is applied.
out_channels (int) – output channels
padding (int) – padding num
- class mmagic.models.editors.stable_diffusion.vae.Upsample2D(channels, use_conv=False, use_conv_transpose=False, out_channels=None, name='conv')[source]¶
Bases:
torch.nn.Module
An upsampling layer with an optional convolution.
- Parameters
channels (int) – channels in the inputs and outputs.
use_conv (bool) – a bool determining if a convolution is applied.
use_conv_transpose (bool) – whether to use conv transpose.
out_channels (int) – output channels.
- class mmagic.models.editors.stable_diffusion.vae.ResnetBlock2D(in_channels, out_channels=None, conv_shortcut=False, dropout=0.0, temb_channels=512, groups=32, groups_out=None, pre_norm=True, eps=1e-06, non_linearity='silu', time_embedding_norm='default', kernel=None, output_scale_factor=1.0, use_in_shortcut=None, up=False, down=False)[source]¶
Bases:
torch.nn.Module
resnet block support down sample and up sample.
- Parameters
in_channels (int) – input channels.
out_channels (int) – output channels.
conv_shortcut (bool) – whether to use conv shortcut.
dropout (float) – dropout rate.
temb_channels (int) – time embedding channels.
groups (int) – conv groups.
groups_out (int) – conv out groups.
pre_norm (bool) – whether to norm before conv. Todo: remove.
eps (float) – eps for groupnorm.
non_linearity (str) – non linearity type.
time_embedding_norm (str) – time embedding norm type.
output_scale_factor (float) – factor to scale input and output.
use_in_shortcut (bool) – whether to use conv in shortcut.
up (bool) – whether to upsample.
down (bool) – whether to downsample.
- class mmagic.models.editors.stable_diffusion.vae.AttentionBlock(channels: int, num_head_channels: Optional[int] = None, norm_num_groups: int = 32, rescale_output_factor: float = 1.0, eps: float = 1e-05)[source]¶
Bases:
torch.nn.Module
An attention block that allows spatial positions to attend to each other. Originally ported from here, but adapted to the N-d case. https://github.com/hojonathanho/diffusion/blob/ 1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/models/unet.py#L66. Uses three q, k, v linear layers to compute attention.
- Parameters
channels (int) – The number of channels in the input and output.
num_head_channels (int, optional) – The number of channels in each head. If None, then num_heads = 1.
norm_num_groups (int, optional, defaults to 32) – The number of groups to use for group norm.
rescale_output_factor (float, optional, defaults to 1.0) – The factor to rescale the output by.
eps (float, optional, defaults to 1e-5) – The epsilon value to use for group norm.
- class mmagic.models.editors.stable_diffusion.vae.UNetMidBlock2D(in_channels: int, temb_channels: int, dropout: float = 0.0, num_layers: int = 1, resnet_eps: float = 1e-06, resnet_time_scale_shift: str = 'default', resnet_act_fn: str = 'silu', resnet_groups: int = 32, resnet_pre_norm: bool = True, attn_num_head_channels=1, attention_type='default', output_scale_factor=1.0)[source]¶
Bases:
torch.nn.Module
middle block in unet.
- Parameters
in_channels (int) – input channels.
temb_channels (int) – time embedding channels.
dropout (float) – dropout rate, defaults to 0.0.
num_layers (int) – layer num.
resnet_eps (float) – resnet eps, defaults to 1e-6.
resnet_time_scale_shift (str) – time scale shift, defaults to ‘default’.
resnet_act_fn (str) – act function in resnet, defaults to ‘silu’.
resnet_groups (int) – conv groups in resnet, defaults to 32.
resnet_pre_norm (bool) – pre norm in resnet, defaults to True.
attn_num_head_channels (int) – attention head channels, defaults to 1.
attention_type (str) – attention type ,defaults to ‘default’.
output_scale_factor (float) – output scale factor, defaults to 1.0.
- class mmagic.models.editors.stable_diffusion.vae.DownEncoderBlock2D(in_channels: int, out_channels: int, dropout: float = 0.0, num_layers: int = 1, resnet_eps: float = 1e-06, resnet_time_scale_shift: str = 'default', resnet_act_fn: str = 'silu', resnet_groups: int = 32, resnet_pre_norm: bool = True, output_scale_factor=1.0, add_downsample=True, downsample_padding=1)[source]¶
Bases:
torch.nn.Module
Down encoder block in vae.
- Parameters
in_channels (int) – input channels.
out_channels (int) – output channels.
dropout (float) – dropout rate, defaults to 0.0.
num_layers (int) – layer nums, defaults to 1.
resnet_eps (float) – resnet eps, defaults to 1e-6.
resnet_time_scale_shift (str) – time scale shift in resnet, defaults to ‘default’.
resnet_act_fn (str) – act function in resnet, defaults to ‘silu’.
resnet_groups (int) – group num in resnet, defaults to 32.
resnet_pre_norm (bool) – whether to pre norm in resnet, defaults to True.
output_scale_factor (float) – output scale factor, defaults to 1.0.
add_downsample (bool) – whether to add downsample, defaults to True,
downsample_padding (int) – downsample padding num, defaults to 1.
- class mmagic.models.editors.stable_diffusion.vae.Encoder(in_channels=3, out_channels=3, down_block_types=('DownEncoderBlock2D',), block_out_channels=(64,), layers_per_block=2, norm_num_groups=32, act_fn='silu', double_z=True)[source]¶
Bases:
torch.nn.Module
construct encoder in vae.
- class mmagic.models.editors.stable_diffusion.vae.UpDecoderBlock2D(in_channels: int, out_channels: int, dropout: float = 0.0, num_layers: int = 1, resnet_eps: float = 1e-06, resnet_time_scale_shift: str = 'default', resnet_act_fn: str = 'swish', resnet_groups: int = 32, resnet_pre_norm: bool = True, output_scale_factor=1.0, add_upsample=True)[source]¶
Bases:
torch.nn.Module
construct up decoder block.
- class mmagic.models.editors.stable_diffusion.vae.Decoder(in_channels=3, out_channels=3, up_block_types=('UpDecoderBlock2D',), block_out_channels=(64,), layers_per_block=2, norm_num_groups=32, act_fn='silu')[source]¶
Bases:
torch.nn.Module
construct decoder in vae.
- class mmagic.models.editors.stable_diffusion.vae.DiagonalGaussianDistribution(parameters, deterministic=False)[source]¶
Bases:
object
Calculate diagonal gaussian distribution.
- class mmagic.models.editors.stable_diffusion.vae.AutoencoderKL(in_channels: int = 3, out_channels: int = 3, down_block_types: Tuple[str] = ('DownEncoderBlock2D',), up_block_types: Tuple[str] = ('UpDecoderBlock2D',), block_out_channels: Tuple[int] = (64,), layers_per_block: int = 1, act_fn: str = 'silu', latent_channels: int = 4, norm_num_groups: int = 32, sample_size: int = 32)[source]¶
Bases:
torch.nn.Module
Variational Autoencoder (VAE) model with KL loss from the paper Auto-Encoding Variational Bayes by Diederik P. Kingma and Max Welling.
- Parameters
in_channels (int, optional, defaults to 3) – Number of channels in the input image.
out_channels (int, optional, defaults to 3) – Number of channels in the output.
(Tuple[str] (up_block_types) – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.
optional – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.
to (defaults) – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.
(Tuple[str] – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.
optional – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.
to – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.
(Tuple[int] (block_out_channels) – obj:(64,)): Tuple of block output channels.
optional – obj:(64,)): Tuple of block output channels.
to – obj:(64,)): Tuple of block output channels.
act_fn (str, optional, defaults to “silu”) – The activation function to use.
latent_channels (int, optional, defaults to 4) – Number of channels in the latent space.
sample_size (int, optional, defaults to 32) – sample size is now not supported.
- decode(z: torch.FloatTensor, return_dict: bool = True) Union[addict.Dict, torch.FloatTensor] [source]¶
decode z.
- forward(sample: torch.FloatTensor, sample_posterior: bool = False, return_dict: bool = True, generator: Optional[torch.Generator] = None) Union[addict.Dict, torch.FloatTensor] [source]¶
- Parameters
sample (torch.FloatTensor) – Input sample.
sample_posterior (bool) – Whether to sample from the posterior. defaults to False.
return_dict (bool, optional, defaults to True) – Whether or not to return a [Dict] instead of a plain tuple.
- Returns
decode results.
- Return type
Dict(sample=dec)