mmagic.models.editors.controlnet.controlnet
¶
Module Contents¶
Classes¶
Implementation of `ControlNet with Stable Diffusion. |
|
Implementation of `ControlNet with Stable Diffusion. |
Attributes¶
- class mmagic.models.editors.controlnet.controlnet.ControlStableDiffusion(vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, controlnet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, dtype: str = 'fp32', enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor=dict(type='DataPreprocessor'), init_cfg: Optional[dict] = None, attention_injection=False)[source]¶
Bases:
mmagic.models.editors.stable_diffusion.StableDiffusion
Implementation of `ControlNet with Stable Diffusion.
<https://arxiv.org/abs/2302.05543>`_ (ControlNet).
- Parameters
vae (Union[dict, nn.Module]) – The config or module for VAE model.
text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.
tokenizer (str) – The name for CLIP tokenizer.
unet (Union[dict, nn.Module]) – The config or module for Unet model.
controlnet (Union[dict, nn.Module]) – The config or module for ControlNet.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
dtype (str, optional) – The dtype for the model. Defaults to ‘fp16’.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise # noqa Defaults to 0.
data_preprocessor (dict, optional) –
The pre-process config of
BaseDataPreprocessor
. Defaults todict(type=’DataPreprocessor’).
init_cfg (dict, optional) – The weight initialized config for
BaseModule
. Defaults to None/
- init_weights()[source]¶
Initialize the weights. Noted that this function will only be called at train. If you want to inference with a different unet model, you can call this function manually or use mmagic.models.editors.controlnet.controlnet_utils.change_base_model to convert the weight of ControlNet manually.
Example: >>> 1. init controlnet from unet >>> init_cfg = dict(type=’init_from_unet’)
>>> 2. switch controlnet weight from unet >>> # base model is not defined, use `runwayml/stable-diffusion-v1-5` >>> # as default >>> init_cfg = dict(type='convert_from_unet') >>> # base model is defined >>> init_cfg = dict( >>> type='convert_from_unet', >>> base_model=dict( >>> type='UNet2DConditionModel', >>> from_pretrained='REPO_ID', >>> subfolder='unet'))
- train_step(data: dict, optim_wrapper: mmengine.optim.OptimWrapperDict) Dict[str, torch.Tensor] [source]¶
Train step for ControlNet model. :param data: Data sampled from dataloader. :type data: dict :param optim_wrapper: OptimWrapperDict instance
contains OptimWrapper of generator and discriminator.
- Returns
A
dict
of tensor for logging.- Return type
Dict[str, torch.Tensor]
- val_step(data: dict) mmagic.utils.typing.SampleList [source]¶
Gets the generated image of given data. Calls
self.data_preprocessor
andself.infer
in order. Return the generated results which will be passed to evaluator or visualizer.- Parameters
data (dict or tuple or list) – Data sampled from dataset.
- Returns
Generated image or image dict.
- Return type
SampleList
- test_step(data: dict) mmagic.utils.typing.SampleList [source]¶
Gets the generated image of given data. Calls
self.data_preprocessor
andself.infer
in order. Return the generated results which will be passed to evaluator or visualizer.- Parameters
data (dict or tuple or list) – Data sampled from dataset.
- Returns
Generated image or image dict.
- Return type
SampleList
- static prepare_control(image: Tuple[PIL.Image.Image, List[PIL.Image.Image], torch.Tensor, List[torch.Tensor]], width: int, height: int, batch_size: int, num_images_per_prompt: int, device: str, dtype: str) torch.Tensor [source]¶
A helper function to prepare single control images.
- Parameters
image (Tuple[Image.Image, List[Image.Image], Tensor, List[Tensor]]) – # noqa The input image for control.
batch_size (int) – The number of the prompt. The control will be repeated for batch_size times.
num_images_per_prompt (int) – The number images generate for one prompt.
device (str) – The device of the control.
dtype (str) – The dtype of the control.
- Returns
The control in torch.tensor.
- Return type
Tensor
- train(mode: bool = True)[source]¶
Set train/eval mode.
- Parameters
mode (bool, optional) – Whether set train mode. Defaults to True.
- infer(prompt: Union[str, List[str]], height: Optional[int] = None, width: Optional[int] = None, control: Optional[Union[str, numpy.ndarray, torch.Tensor]] = None, controlnet_conditioning_scale: float = 1.0, num_inference_steps: int = 20, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[torch.Generator] = None, latents: Optional[torch.FloatTensor] = None, return_type='image', show_progress=True)[source]¶
Function invoked when calling the pipeline for generation.
- Parameters
prompt (str or List[str]) – The prompt or prompts to guide the image generation.
height (int, Optional) – The height in pixels of the generated image. If not passed, the height will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
width (int, Optional) – The width in pixels of the generated image. If not passed, the width will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
num_inference_steps (int) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. Defaults to 50.
guidance_scale (float) – Guidance scale as defined in Classifier- Free Diffusion Guidance (https://arxiv.org/abs/2207.12598). Defaults to 7.5
negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1). Defaults to None.
num_images_per_prompt (int) – The number of images to generate per prompt. Defaults to 1.
eta (float) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to DDIMScheduler, will be ignored for others. Defaults to 0.0.
generator (torch.Generator, optional) – A torch generator to make generation deterministic. Defaults to None.
latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator. Defaults to None.
return_type (str) – The return type of the inference results. Supported types are ‘image’, ‘numpy’, ‘tensor’. If ‘image’ is passed, a list of PIL images will be returned. If ‘numpy’ is passed, a numpy array with shape [N, C, H, W] will be returned, and the value range will be same as decoder’s output range. If ‘tensor’ is passed, the decoder’s output will be returned. Defaults to ‘image’.
- Returns
A dict containing the generated images and Control image.
- Return type
dict
- class mmagic.models.editors.controlnet.controlnet.ControlStableDiffusionImg2Img(vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, controlnet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, dtype: str = 'fp32', enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor=dict(type='DataPreprocessor'), init_cfg: Optional[dict] = None, attention_injection=False)[source]¶
Bases:
ControlStableDiffusion
Implementation of `ControlNet with Stable Diffusion.
<https://arxiv.org/abs/2302.05543>`_ (ControlNet).
- Parameters
vae (Union[dict, nn.Module]) – The config or module for VAE model.
text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.
tokenizer (str) – The name for CLIP tokenizer.
unet (Union[dict, nn.Module]) – The config or module for Unet model.
controlnet (Union[dict, nn.Module]) – The config or module for ControlNet.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
dtype (str, optional) – The dtype for the model. Defaults to ‘fp16’.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise # noqa Defaults to 0.
data_preprocessor (dict, optional) –
The pre-process config of
BaseDataPreprocessor
. Defaults todict(type=’DataPreprocessor’).
init_cfg (dict, optional) – The weight initialized config for
BaseModule
. Defaults to None/
- prepare_latents(image, timestep, batch_size, num_images_per_prompt, dtype, device, generator=None, noise=None)[source]¶
prepare latents for diffusion to run in latent space.
- Parameters
batch_size (int) – batch size.
num_channels_latents (int) – latent channel nums.
height (int) – image height.
width (int) – image width.
dtype (torch.dtype) – float type.
device (torch.device) – torch device.
generator (torch.Generator) – generator for random functions, defaults to None.
latents (torch.Tensor) – Pre-generated noisy latents, defaults to None.
- Returns
prepared latents.
- Return type
latents (torch.Tensor)
- infer(prompt: Union[str, List[str]], latent_image: Union[torch.FloatTensor, PIL.Image.Image, List[torch.FloatTensor], List[PIL.Image.Image]] = None, latent_mask: torch.FloatTensor = None, strength: float = 1.0, height: Optional[int] = None, width: Optional[int] = None, control: Optional[Union[str, numpy.ndarray, torch.Tensor]] = None, controlnet_conditioning_scale: float = 1.0, num_inference_steps: int = 20, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[torch.Generator] = None, latents: Optional[torch.FloatTensor] = None, return_type='image', show_progress=True, reference_img: Union[torch.FloatTensor, PIL.Image.Image, List[torch.FloatTensor], List[PIL.Image.Image]] = None)[source]¶
Function invoked when calling the pipeline for generation.
- Parameters
prompt (str or List[str]) – The prompt or prompts to guide the image generation.
height (int, Optional) – The height in pixels of the generated image. If not passed, the height will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
width (int, Optional) – The width in pixels of the generated image. If not passed, the width will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
num_inference_steps (int) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. Defaults to 50.
guidance_scale (float) – Guidance scale as defined in Classifier- Free Diffusion Guidance (https://arxiv.org/abs/2207.12598). Defaults to 7.5
negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1). Defaults to None.
num_images_per_prompt (int) – The number of images to generate per prompt. Defaults to 1.
eta (float) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to DDIMScheduler, will be ignored for others. Defaults to 0.0.
generator (torch.Generator, optional) – A torch generator to make generation deterministic. Defaults to None.
latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator. Defaults to None.
return_type (str) – The return type of the inference results. Supported types are ‘image’, ‘numpy’, ‘tensor’. If ‘image’ is passed, a list of PIL images will be returned. If ‘numpy’ is passed, a numpy array with shape [N, C, H, W] will be returned, and the value range will be same as decoder’s output range. If ‘tensor’ is passed, the decoder’s output will be returned. Defaults to ‘image’.
- Returns
A dict containing the generated images and Control image.
- Return type
dict