`mmagic.models.editors.controlnet.controlnet`¶

Module Contents¶

Classes¶

`ControlStableDiffusion`	Implementation of `ControlNet with Stable Diffusion.
`ControlStableDiffusionImg2Img`	Implementation of `ControlNet with Stable Diffusion.

Attributes¶

ModelType

mmagic.models.editors.controlnet.controlnet.ModelType[source]¶

class mmagic.models.editors.controlnet.controlnet.ControlStableDiffusion(vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, controlnet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, dtype: str = 'fp32', enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor=dict(type='DataPreprocessor'), init_cfg: Optional[dict] = None, attention_injection=False)[source]¶

Bases: mmagic.models.editors.stable_diffusion.StableDiffusion

Implementation of `ControlNet with Stable Diffusion.

<https://arxiv.org/abs/2302.05543>`_ (ControlNet).

Parameters

vae (Union[dict, nn.Module]) – The config or module for VAE model.
text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.
tokenizer (str) – The name for CLIP tokenizer.
unet (Union[dict, nn.Module]) – The config or module for Unet model.
controlnet (Union[dict, nn.Module]) – The config or module for ControlNet.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
dtype (str, optional) – The dtype for the model. Defaults to ‘fp16’.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise # noqa Defaults to 0.
data_preprocessor (dict, optional) –
The pre-process config of BaseDataPreprocessor. Defaults to

dict(type=’DataPreprocessor’).
init_cfg (dict, optional) – The weight initialized config for BaseModule. Defaults to None/

init_weights()[source]¶

Initialize the weights. Noted that this function will only be called at train. If you want to inference with a different unet model, you can call this function manually or use mmagic.models.editors.controlnet.controlnet_utils.change_base_model to convert the weight of ControlNet manually.

Example: >>> 1. init controlnet from unet >>> init_cfg = dict(type=’init_from_unet’)

>>> 2. switch controlnet weight from unet
>>> # base model is not defined, use `runwayml/stable-diffusion-v1-5`
>>> # as default
>>> init_cfg = dict(type='convert_from_unet')
>>> # base model is defined
>>> init_cfg = dict(
>>>     type='convert_from_unet',
>>>     base_model=dict(
>>>         type='UNet2DConditionModel',
>>>         from_pretrained='REPO_ID',
>>>         subfolder='unet'))

train_step(data: dict, optim_wrapper: mmengine.optim.OptimWrapperDict) → Dict[str, torch.Tensor][source]¶

Train step for ControlNet model. :param data: Data sampled from dataloader. :type data: dict :param optim_wrapper: OptimWrapperDict instance

contains OptimWrapper of generator and discriminator.

Returns: A dict of tensor for logging.
Return type: Dict[str, torch.Tensor]

val_step(data: dict) → mmagic.utils.typing.SampleList[source]¶

Gets the generated image of given data. Calls self.data_preprocessor and self.infer in order. Return the generated results which will be passed to evaluator or visualizer.

Parameters: data (dict or tuple or list) – Data sampled from dataset.
Returns: Generated image or image dict.
Return type: SampleList

test_step(data: dict) → mmagic.utils.typing.SampleList[source]¶

Gets the generated image of given data. Calls self.data_preprocessor and self.infer in order. Return the generated results which will be passed to evaluator or visualizer.

Parameters: data (dict or tuple or list) – Data sampled from dataset.
Returns: Generated image or image dict.
Return type: SampleList

static prepare_control(image: Tuple[PIL.Image.Image, List[PIL.Image.Image], torch.Tensor, List[torch.Tensor]], width: int, height: int, batch_size: int, num_images_per_prompt: int, device: str, dtype: str) → torch.Tensor[source]¶

A helper function to prepare single control images.

Parameters

image (Tuple[Image.Image, List[Image.Image], Tensor, List[Tensor]]) – # noqa The input image for control.
batch_size (int) – The number of the prompt. The control will be repeated for batch_size times.
num_images_per_prompt (int) – The number images generate for one prompt.
device (str) – The device of the control.
dtype (str) – The dtype of the control.

Returns

The control in torch.tensor.

Return type

Tensor

train(mode: bool = True)[source]¶

Set train/eval mode.

Parameters: mode (bool, optional) – Whether set train mode. Defaults to True.

infer(prompt: Union[str, List[str]], height: Optional[int] = None, width: Optional[int] = None, control: Optional[Union[str, numpy.ndarray, torch.Tensor]] = None, controlnet_conditioning_scale: float = 1.0, num_inference_steps: int = 20, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[torch.Generator] = None, latents: Optional[torch.FloatTensor] = None, return_type='image', show_progress=True)[source]¶

Function invoked when calling the pipeline for generation.

Parameters

prompt (str or List[str]) – The prompt or prompts to guide the image generation.
height (int, Optional) – The height in pixels of the generated image. If not passed, the height will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
width (int, Optional) – The width in pixels of the generated image. If not passed, the width will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
num_inference_steps (int) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. Defaults to 50.
guidance_scale (float) – Guidance scale as defined in Classifier- Free Diffusion Guidance (https://arxiv.org/abs/2207.12598). Defaults to 7.5
negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1). Defaults to None.
num_images_per_prompt (int) – The number of images to generate per prompt. Defaults to 1.
eta (float) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to DDIMScheduler, will be ignored for others. Defaults to 0.0.
generator (torch.Generator, optional) – A torch generator to make generation deterministic. Defaults to None.
latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator. Defaults to None.
return_type (str) – The return type of the inference results. Supported types are ‘image’, ‘numpy’, ‘tensor’. If ‘image’ is passed, a list of PIL images will be returned. If ‘numpy’ is passed, a numpy array with shape [N, C, H, W] will be returned, and the value range will be same as decoder’s output range. If ‘tensor’ is passed, the decoder’s output will be returned. Defaults to ‘image’.

Returns

A dict containing the generated images and Control image.

Return type

dict

abstract forward(*args, **kwargs)[source]¶: forward is not implemented now.

class mmagic.models.editors.controlnet.controlnet.ControlStableDiffusionImg2Img(vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, controlnet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, dtype: str = 'fp32', enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor=dict(type='DataPreprocessor'), init_cfg: Optional[dict] = None, attention_injection=False)[source]¶

Bases: ControlStableDiffusion

Implementation of `ControlNet with Stable Diffusion.

<https://arxiv.org/abs/2302.05543>`_ (ControlNet).

Parameters

vae (Union[dict, nn.Module]) – The config or module for VAE model.
text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.
tokenizer (str) – The name for CLIP tokenizer.
unet (Union[dict, nn.Module]) – The config or module for Unet model.
controlnet (Union[dict, nn.Module]) – The config or module for ControlNet.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
dtype (str, optional) – The dtype for the model. Defaults to ‘fp16’.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise # noqa Defaults to 0.
data_preprocessor (dict, optional) –
The pre-process config of BaseDataPreprocessor. Defaults to

dict(type=’DataPreprocessor’).
init_cfg (dict, optional) – The weight initialized config for BaseModule. Defaults to None/

_default_height_width(height, width, image)[source]¶

get_timesteps(num_inference_steps, strength, device)[source]¶

prepare_latents(image, timestep, batch_size, num_images_per_prompt, dtype, device, generator=None, noise=None)[source]¶

prepare latents for diffusion to run in latent space.

Parameters

batch_size (int) – batch size.
num_channels_latents (int) – latent channel nums.
height (int) – image height.
width (int) – image width.
dtype (torch.dtype) – float type.
device (torch.device) – torch device.
generator (torch.Generator) – generator for random functions, defaults to None.
latents (torch.Tensor) – Pre-generated noisy latents, defaults to None.

Returns

prepared latents.

Return type

latents (torch.Tensor)

prepare_latent_image(image, dtype)[source]¶

infer(prompt: Union[str, List[str]], latent_image: Union[torch.FloatTensor, PIL.Image.Image, List[torch.FloatTensor], List[PIL.Image.Image]] = None, latent_mask: torch.FloatTensor = None, strength: float = 1.0, height: Optional[int] = None, width: Optional[int] = None, control: Optional[Union[str, numpy.ndarray, torch.Tensor]] = None, controlnet_conditioning_scale: float = 1.0, num_inference_steps: int = 20, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[torch.Generator] = None, latents: Optional[torch.FloatTensor] = None, return_type='image', show_progress=True, reference_img: Union[torch.FloatTensor, PIL.Image.Image, List[torch.FloatTensor], List[PIL.Image.Image]] = None)[source]¶

Function invoked when calling the pipeline for generation.

Parameters

prompt (str or List[str]) – The prompt or prompts to guide the image generation.
height (int, Optional) – The height in pixels of the generated image. If not passed, the height will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
width (int, Optional) – The width in pixels of the generated image. If not passed, the width will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
num_inference_steps (int) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. Defaults to 50.
guidance_scale (float) – Guidance scale as defined in Classifier- Free Diffusion Guidance (https://arxiv.org/abs/2207.12598). Defaults to 7.5
negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1). Defaults to None.
num_images_per_prompt (int) – The number of images to generate per prompt. Defaults to 1.
eta (float) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to DDIMScheduler, will be ignored for others. Defaults to 0.0.
generator (torch.Generator, optional) – A torch generator to make generation deterministic. Defaults to None.
latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator. Defaults to None.
return_type (str) – The return type of the inference results. Supported types are ‘image’, ‘numpy’, ‘tensor’. If ‘image’ is passed, a list of PIL images will be returned. If ‘numpy’ is passed, a numpy array with shape [N, C, H, W] will be returned, and the value range will be same as decoder’s output range. If ‘tensor’ is passed, the decoder’s output will be returned. Defaults to ‘image’.

Returns

A dict containing the generated images and Control image.

Return type

dict

mmagic.models.editors.controlnet.controlnet¶

Module Contents¶

Classes¶

Attributes¶

`mmagic.models.editors.controlnet.controlnet`¶