Shortcuts

mmagic.models.editors.stable_diffusion

Package Contents

Classes

StableDiffusion

Class for Stable Diffusion. Refers to https://github.com/Stability-

StableDiffusionInpaint

Class for Stable Diffusion. Refers to https://github.com/Stability-

AutoencoderKL

Variational Autoencoder (VAE) model with KL loss

class mmagic.models.editors.stable_diffusion.StableDiffusion(vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, dtype: Optional[str] = None, enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor: Optional[ModelType] = dict(type='DataPreprocessor'), init_cfg: Optional[dict] = None)[source]

Bases: mmengine.model.BaseModel

Class for Stable Diffusion. Refers to https://github.com/Stability- AI/stablediffusion and https://github.com/huggingface/diffusers/blob/main/s rc/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_an d_excite.py # noqa.

Parameters
  • unet (Union[dict, nn.Module]) – The config or module for Unet model.

  • text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.

  • vae (Union[dict, nn.Module]) – The config or module for VAE model.

  • tokenizer (str) – The name for CLIP tokenizer.

  • schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.

  • test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.

  • dtype (str, optional) – The dtype for the model This argument will not work when dtype is defined for submodels. Defaults to None.

  • enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.

  • noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise Defaults to 0.

  • data_preprocessor (dict, optional) – The pre-process config of BaseDataPreprocessor.

  • init_cfg (dict, optional) – The weight initialized config for BaseModule.

property device
set_xformers(module: Optional[torch.nn.Module] = None) torch.nn.Module[source]

Set xformers for the model.

Returns

The model with xformers.

Return type

nn.Module

set_tomesd() torch.nn.Module[source]

Set ToMe for the stable diffusion model.

Returns

The model with ToMe.

Return type

nn.Module

train(mode: bool = True)[source]

Set train/eval mode.

Parameters

mode (bool, optional) – Whether set train mode. Defaults to True.

infer(prompt: Union[str, List[str]], height: Optional[int] = None, width: Optional[int] = None, num_inference_steps: int = 50, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[torch.Generator] = None, latents: Optional[torch.FloatTensor] = None, show_progress=True, seed=1, return_type='image')[source]

Function invoked when calling the pipeline for generation.

Parameters
  • prompt (str or List[str]) – The prompt or prompts to guide the image generation.

  • (int (height) – defaults to self.unet_sample_size * self.vae_scale_factor): The height in pixels of the generated image.

  • optional – defaults to self.unet_sample_size * self.vae_scale_factor): The height in pixels of the generated image.

:paramdefaults to self.unet_sample_size * self.vae_scale_factor):

The height in pixels of the generated image.

Parameters
  • (int (width) – defaults to self.unet_sample_size * self.vae_scale_factor): The width in pixels of the generated image.

  • optional – defaults to self.unet_sample_size * self.vae_scale_factor): The width in pixels of the generated image.

:paramdefaults to self.unet_sample_size * self.vae_scale_factor):

The width in pixels of the generated image.

Parameters
  • num_inference_steps (int, optional, defaults to 50) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

  • guidance_scale (float, optional, defaults to 7.5) – Guidance scale as defined in [Classifier-Free Diffusion Guidance] (https://arxiv.org/abs/2207.12598).

  • negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).

  • num_images_per_prompt (int, optional, defaults to 1) – The number of images to generate per prompt.

  • eta (float, optional, defaults to 0.0) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to [schedulers.DDIMScheduler], will be ignored for others.

  • generator (torch.Generator, optional) – A [torch generator] to make generation deterministic.

  • latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator.

  • return_type (str) – The return type of the inference results. Supported types are ‘image’, ‘numpy’, ‘tensor’. If ‘image’ is passed, a list of PIL images will be returned. If ‘numpy’ is passed, a numpy array with shape [N, C, H, W] will be returned, and the value range will be same as decoder’s output range. If ‘tensor’ is passed, the decoder’s output will be returned. Defaults to ‘image’.

Returns

A dict containing the generated images.

Return type

dict

output_to_pil(image) List[PIL.Image.Image][source]

Convert output tensor to PIL image. Output tensor will be de-normed to [0, 255] by DataPreprocessor.destruct. Due to no data_samples is passed, color order conversion will not be performed.

Parameters

image (torch.Tensor) – The output tensor of the decoder.

Returns

The list of processed PIL images.

Return type

List[Image.Image]

_encode_prompt(prompt, device, num_images_per_prompt, do_classifier_free_guidance, negative_prompt)[source]

Encodes the prompt into text encoder hidden states.

Parameters
  • prompt (str or list(int)) – prompt to be encoded.

  • device – (torch.device): torch device.

  • num_images_per_prompt (int) – number of images that should be generated per prompt.

  • do_classifier_free_guidance (bool) – whether to use classifier free guidance or not.

  • negative_prompt (str or List[str]) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).

Returns

text embeddings generated by

clip text encoder.

Return type

text_embeddings (torch.Tensor)

decode_latents(latents)[source]

use vae to decode latents.

Parameters

latents (torch.Tensor) – latents to decode.

Returns

image result.

Return type

image (torch.Tensor)

prepare_extra_step_kwargs(generator, eta)[source]

prepare extra kwargs for the scheduler step.

Parameters
  • generator (torch.Generator) – generator for random functions.

  • eta (float) – eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers. eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502 and should be between [0, 1]

Returns

dict contains ‘generator’ and ‘eta’

Return type

extra_step_kwargs (dict)

prepare_test_scheduler_extra_step_kwargs(generator, eta)[source]

prepare extra kwargs for the scheduler step.

Parameters
  • generator (torch.Generator) – generator for random functions.

  • eta (float) – eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers. eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502 and should be between [0, 1]

Returns

dict contains ‘generator’ and ‘eta’

Return type

extra_step_kwargs (dict)

check_inputs(prompt, height, width)[source]

check whether inputs are in suitable format or not.

prepare_latents(batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None)[source]

prepare latents for diffusion to run in latent space.

Parameters
  • batch_size (int) – batch size.

  • num_channels_latents (int) – latent channel nums.

  • height (int) – image height.

  • width (int) – image width.

  • dtype (torch.dtype) – float type.

  • device (torch.device) – torch device.

  • generator (torch.Generator) – generator for random functions, defaults to None.

  • latents (torch.Tensor) – Pre-generated noisy latents, defaults to None.

Returns

prepared latents.

Return type

latents (torch.Tensor)

val_step(data: dict) mmagic.utils.typing.SampleList[source]

Gets the predictions of given data.

Calls self.data_preprocessor(data, False) and self(inputs, data_sample, mode='predict') in order. Return the predictions which will be passed to evaluator.

Parameters

data (dict or tuple or list) – Data sampled from dataset.

Returns

The predictions of given data.

Return type

list

test_step(data: dict) mmagic.utils.typing.SampleList[source]

BaseModel implements test_step the same as val_step.

Parameters

data (dict or tuple or list) – Data sampled from dataset.

Returns

The predictions of given data.

Return type

list

train_step(data, optim_wrapper_dict)[source]

Implements the default model training process including preprocessing, model forward propagation, loss calculation, optimization, and back-propagation.

During non-distributed training. If subclasses do not override the train_step(), EpochBasedTrainLoop or IterBasedTrainLoop will call this method to update model parameters. The default parameter update process is as follows:

  1. Calls self.data_processor(data, training=False) to collect batch_inputs and corresponding data_samples(labels).

  2. Calls self(batch_inputs, data_samples, mode='loss') to get raw loss

  3. Calls self.parse_losses to get parsed_losses tensor used to backward and dict of loss tensor used to log messages.

  4. Calls optim_wrapper.update_params(loss) to update model.

Parameters
  • data (dict or tuple or list) – Data sampled from dataset.

  • optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.

Returns

A dict of tensor for logging.

Return type

Dict[str, torch.Tensor]

abstract forward(inputs: torch.Tensor, data_samples: Optional[list] = None, mode: str = 'tensor') Union[Dict[str, torch.Tensor], list][source]

forward is not implemented now.

class mmagic.models.editors.stable_diffusion.StableDiffusionInpaint(*args, **kwargs)[source]

Bases: mmagic.models.editors.stable_diffusion.stable_diffusion.StableDiffusion

Class for Stable Diffusion. Refers to https://github.com/Stability- AI/stablediffusion and https://github.com/huggingface/diffusers/blob/main/s rc/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_an d_excite.py # noqa.

Parameters
  • unet (Union[dict, nn.Module]) – The config or module for Unet model.

  • text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.

  • vae (Union[dict, nn.Module]) – The config or module for VAE model.

  • tokenizer (str) – The name for CLIP tokenizer.

  • schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.

  • test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.

  • dtype (str, optional) – The dtype for the model This argument will not work when dtype is defined for submodels. Defaults to None.

  • enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.

  • noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise Defaults to 0.

  • data_preprocessor (dict, optional) – The pre-process config of BaseDataPreprocessor.

  • init_cfg (dict, optional) – The weight initialized config for BaseModule.

infer(prompt: Union[str, List[str]], image: Union[torch.FloatTensor, PIL.Image.Image] = None, mask_image: Union[torch.FloatTensor, PIL.Image.Image] = None, height: Optional[int] = None, width: Optional[int] = None, num_inference_steps: int = 50, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[torch.Generator] = None, latents: Optional[torch.FloatTensor] = None, show_progress=True, seed=1, return_type='image')[source]

Function invoked when calling the pipeline for generation.

Parameters
  • prompt (str or List[str]) – The prompt or prompts to guide the image generation.

  • image (Union[torch.FloatTensor, Image.Image]) – The image to inpaint.

  • mask_image (Union[torch.FloatTensor, Image.Image]) – The mask to apply to the image, i.e. regions to inpaint.

  • (int (height) – defaults to self.unet_sample_size * self.vae_scale_factor): The height in pixels of the generated image.

  • optional – defaults to self.unet_sample_size * self.vae_scale_factor): The height in pixels of the generated image.

:paramdefaults to self.unet_sample_size * self.vae_scale_factor):

The height in pixels of the generated image.

Parameters
  • (int (width) – defaults to self.unet_sample_size * self.vae_scale_factor): The width in pixels of the generated image.

  • optional – defaults to self.unet_sample_size * self.vae_scale_factor): The width in pixels of the generated image.

:paramdefaults to self.unet_sample_size * self.vae_scale_factor):

The width in pixels of the generated image.

Parameters
  • num_inference_steps (int, optional, defaults to 50) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

  • guidance_scale (float, optional, defaults to 7.5) – Guidance scale as defined in [Classifier-Free Diffusion Guidance] (https://arxiv.org/abs/2207.12598).

  • negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).

  • num_images_per_prompt (int, optional, defaults to 1) – The number of images to generate per prompt.

  • eta (float, optional, defaults to 0.0) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to [schedulers.DDIMScheduler], will be ignored for others.

  • generator (torch.Generator, optional) – A [torch generator] to make generation deterministic.

  • latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator.

  • return_type (str) – The return type of the inference results. Supported types are ‘image’, ‘numpy’, ‘tensor’. If ‘image’ is passed, a list of PIL images will be returned. If ‘numpy’ is passed, a numpy array with shape [N, C, H, W] will be returned, and the value range will be same as decoder’s output range. If ‘tensor’ is passed, the decoder’s output will be returned. Defaults to ‘image’.

Returns

A dict containing the generated images.

Return type

dict

prepare_mask_latents(mask, masked_image, batch_size, num_channels_latents, height, width, dtype, device, generator, do_classifier_free_guidance)[source]

prepare latents for diffusion to run in latent space.

Parameters
  • mask (torch.Tensor) – The mask to apply to the image, i.e. regions to inpaint.

  • image (torch.Tensor) – The image to be masked.

  • batch_size (int) – batch size.

  • num_channels_latents (int) – latent channel nums.

  • height (int) – image height.

  • width (int) – image width.

  • dtype (torch.dtype) – float type.

  • device (torch.device) – torch device.

  • generator (torch.Generator) – generator for random functions, defaults to None.

  • latents (torch.Tensor) – Pre-generated noisy latents, defaults to None.

  • do_classifier_free_guidance (bool) – Whether to apply classifier-free guidance.

Returns

prepared latents.

Return type

latents (torch.Tensor)

abstract val_step(data: dict) mmagic.utils.typing.SampleList[source]

Performs a validation step on the provided data.

This method is decorated with torch.no_grad() which indicates no gradients will be computed during the operations. This ensures efficient memory usage during testing.

Parameters

data (dict) – Dictionary containing input data for testing.

Returns

List of samples processed during the testing step.

Return type

SampleList

Raises

NotImplementedError – This method has not been implemented.

abstract test_step(data: dict) mmagic.utils.typing.SampleList[source]

Performs a testing step on the provided data.

This method is decorated with torch.no_grad() which indicates no gradients will be computed during the operations. This ensures efficient memory usage during testing.

Parameters

data (dict) – Dictionary containing input data for testing.

Returns

List of samples processed during the testing step.

Return type

SampleList

Raises

NotImplementedError – This method has not been implemented.

abstract train_step(data, optim_wrapper_dict)[source]

Performs a training step on the provided data.

Parameters
  • data – Input data for training.

  • optim_wrapper_dict – Dictionary containing optimizer wrappers which may contain optimizers, schedulers, etc. required for the training step.

Raises

NotImplementedError – This method has not been implemented.

class mmagic.models.editors.stable_diffusion.AutoencoderKL(in_channels: int = 3, out_channels: int = 3, down_block_types: Tuple[str] = ('DownEncoderBlock2D',), up_block_types: Tuple[str] = ('UpDecoderBlock2D',), block_out_channels: Tuple[int] = (64,), layers_per_block: int = 1, act_fn: str = 'silu', latent_channels: int = 4, norm_num_groups: int = 32, sample_size: int = 32)[source]

Bases: torch.nn.Module

Variational Autoencoder (VAE) model with KL loss from the paper Auto-Encoding Variational Bayes by Diederik P. Kingma and Max Welling.

Parameters
  • in_channels (int, optional, defaults to 3) – Number of channels in the input image.

  • out_channels (int, optional, defaults to 3) – Number of channels in the output.

  • (Tuple[str] (up_block_types) – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.

  • optional – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.

  • to (defaults) – obj:(“DownEncoderBlock2D”,)): Tuple of downsample block types.

  • (Tuple[str] – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.

  • optional – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.

  • to – obj:(“UpDecoderBlock2D”,)): Tuple of upsample block types.

  • (Tuple[int] (block_out_channels) – obj:(64,)): Tuple of block output channels.

  • optional – obj:(64,)): Tuple of block output channels.

  • to – obj:(64,)): Tuple of block output channels.

  • act_fn (str, optional, defaults to “silu”) – The activation function to use.

  • latent_channels (int, optional, defaults to 4) – Number of channels in the latent space.

  • sample_size (int, optional, defaults to 32) – sample size is now not supported.

property dtype

The data type of the parameters of VAE.

encode(x: torch.FloatTensor, return_dict: bool = True) addict.Dict[source]

encode input.

decode(z: torch.FloatTensor, return_dict: bool = True) Union[addict.Dict, torch.FloatTensor][source]

decode z.

forward(sample: torch.FloatTensor, sample_posterior: bool = False, return_dict: bool = True, generator: Optional[torch.Generator] = None) Union[addict.Dict, torch.FloatTensor][source]
Parameters
  • sample (torch.FloatTensor) – Input sample.

  • sample_posterior (bool) – Whether to sample from the posterior. defaults to False.

  • return_dict (bool, optional, defaults to True) – Whether or not to return a [Dict] instead of a plain tuple.

Returns

decode results.

Return type

Dict(sample=dec)

Read the Docs v: latest
Versions
latest
stable
0.x
Downloads
pdf
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.