Shortcuts

mmagic.models.editors.vico

Package Contents

Classes

ViCo

Implementation of `ViCo with Stable Diffusion.

Functions

set_vico_modules(unet, image_cross_layers)

Set all modules for ViCo method after the UNet initialized normally.

class mmagic.models.editors.vico.ViCo(vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, val_prompts: Union[str, List[str]] = None, dtype: str = 'fp16', enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor: Optional[ModelType] = dict(type='DataPreprocessor'), init_cfg: Optional[dict] = None, image_cross_layers: List[int] = None, reg_loss_weight: float = 0, placeholder: str = None, initialize_token: str = None, num_vectors_per_token: int = 1)[source]

Bases: mmagic.models.editors.stable_diffusion.stable_diffusion.StableDiffusion

Implementation of `ViCo with Stable Diffusion.

<https://arxiv.org/abs/2306.00971>`_ (ViCo).

Parameters
  • vae (Union[dict, nn.Module]) – The config or module for VAE model.

  • text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.

  • tokenizer (str) – The name for CLIP tokenizer.

  • unet (Union[dict, nn.Module]) – The config or module for Unet model.

  • schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.

  • test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.

  • val_prompts (Union[str, List[str]], optional) – The prompts for validation. Defaults to None.

  • num_class_images (int, optional) – The number of images for class prior. Defaults to 3.

  • dtype (str, optional) – The dtype for the model. Defaults to ‘fp16’.

  • enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.

  • noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise # noqa Defaults to 0.

  • data_preprocessor (dict, optional) –

    The pre-process config of BaseDataPreprocessor. Defaults to

    dict(type=’DataPreprocessor’).

  • init_cfg (dict, optional) – The weight initialized config for BaseModule. Defaults to None/

  • image_cross_layers (List[int], optional) – The layers to use image cross attention. Defaults to None.

  • reg_loss_weight (float, optional) – The weight of regularization loss. Defaults to 0.

  • placeholder (str, optional) – The placeholder token. Defaults to None.

  • initialize_token (str, optional) – The token to initialize the placeholder. Defaults to None.

  • num_vectors_per_token (int, optional) – The number of vectors per token.

prepare_models()[source]

Prepare model for training.

Move model to target dtype and disable gradient for some models.

set_vico(have_image_cross_attention: List[int])[source]

Set ViCo for model.

set_only_imca_trainable()[source]

Set only image cross attention trainable.

add_tokens(placeholder_token: str, initialize_token: str = None, num_vectors_per_token: int = 1)[source]

Add token for training.

# TODO: support add tokens as dict, then we can load pretrained tokens.

val_step(data: dict) mmagic.utils.typing.SampleList[source]

Gets the generated image of given data. Calls self.data_preprocessor and self.infer in order. Return the generated results which will be passed to evaluator or visualizer.

Parameters

data (dict or tuple or list) – Data sampled from dataset.

Returns

Generated image or image dict.

Return type

SampleList

test_step(data: dict) mmagic.utils.typing.SampleList[source]

Gets the generated image of given data. Calls self.data_preprocessor and self.infer in order. Return the generated results which will be passed to evaluator or visualizer.

Parameters

data (dict or tuple or list) – Data sampled from dataset.

Returns

Generated image or image dict.

Return type

SampleList

prepare_reference(image_ref: Union[PIL.Image.Image, torch.Tensor], height: Optional[int] = 512, width: Optional[int] = 512)[source]
train_step(data, optim_wrapper)[source]

Training step.

infer(prompt: Union[str, List[str]], image_reference: PIL.Image.Image = None, height: Optional[int] = None, width: Optional[int] = None, num_inference_steps: int = 50, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[torch.Generator] = None, latents: Optional[torch.FloatTensor] = None, show_progress=True, seed=1, return_type='image')[source]

Function invoked when calling the pipeline for generation.

Parameters
  • prompt (str or List[str]) – The prompt or prompts to guide the image generation.

  • (int (height) – defaults to self.unet_sample_size * self.vae_scale_factor): The height in pixels of the generated image.

  • optional – defaults to self.unet_sample_size * self.vae_scale_factor): The height in pixels of the generated image.

:paramdefaults to self.unet_sample_size * self.vae_scale_factor):

The height in pixels of the generated image.

Parameters
  • (int (width) – defaults to self.unet_sample_size * self.vae_scale_factor): The width in pixels of the generated image.

  • optional – defaults to self.unet_sample_size * self.vae_scale_factor): The width in pixels of the generated image.

:paramdefaults to self.unet_sample_size * self.vae_scale_factor):

The width in pixels of the generated image.

Parameters
  • num_inference_steps (int, optional, defaults to 50) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

  • guidance_scale (float, optional, defaults to 7.5) – Guidance scale as defined in [Classifier-Free Diffusion Guidance] (https://arxiv.org/abs/2207.12598).

  • negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).

  • num_images_per_prompt (int, optional, defaults to 1) – The number of images to generate per prompt.

  • eta (float, optional, defaults to 0.0) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to [schedulers.DDIMScheduler], will be ignored for others.

  • generator (torch.Generator, optional) – A [torch generator] to make generation deterministic.

  • latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator.

  • return_type (str) – The return type of the inference results. Supported types are ‘image’, ‘numpy’, ‘tensor’. If ‘image’ is passed, a list of PIL images will be returned. If ‘numpy’ is passed, a numpy array with shape [N, C, H, W] will be returned, and the value range will be same as decoder’s output range. If ‘tensor’ is passed, the decoder’s output will be returned. Defaults to ‘image’.

Returns

A dict containing the generated images.

Return type

dict

abstract forward(inputs: torch.Tensor, data_samples: Optional[list] = None, mode: str = 'tensor') Union[Dict[str, torch.Tensor], list][source]

forward is not implemented now.

mmagic.models.editors.vico.set_vico_modules(unet, image_cross_layers)[source]

Set all modules for ViCo method after the UNet initialized normally.

Parameters
  • unet (nn.Module) – UNet model.

  • image_cross_layers (List) – List of flag indicating which

  • modules. (transformer2D modules have image_cross_attention) –

Read the Docs v: latest
Versions
latest
stable
0.x
Downloads
pdf
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.