`mmagic.models.editors.fastcomposer.fastcomposer`¶

Module Contents¶

Classes¶

FastComposer

Class for Stable Diffusion. Refers to https://github.com/Stability-

Attributes¶

ModelType

mmagic.models.editors.fastcomposer.fastcomposer.ModelType[source]¶

class mmagic.models.editors.fastcomposer.fastcomposer.FastComposer(pretrained_cfg: dict, vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, dtype: str = 'fp32', enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor=dict(type='DataPreprocessor'), init_cfg: Optional[dict] = None)[source]¶

Bases: mmagic.models.editors.stable_diffusion.StableDiffusion

Class for Stable Diffusion. Refers to https://github.com/Stability- AI/stablediffusion and https://github.com/huggingface/diffusers/blob/main/s rc/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_an d_excite.py # noqa.

Parameters

unet (Union[dict, nn.Module]) – The config or module for Unet model.
text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.
vae (Union[dict, nn.Module]) – The config or module for VAE model.
tokenizer (str) – The name for CLIP tokenizer.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
dtype (str, optional) – The dtype for the model This argument will not work when dtype is defined for submodels. Defaults to None.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise Defaults to 0.
data_preprocessor (dict, optional) – The pre-process config of BaseDataPreprocessor.
init_cfg (dict, optional) – The weight initialized config for BaseModule.

_tokenize_and_mask_noun_phrases_ends(caption)[source]¶: Augment the text embedding.

_encode_augmented_prompt(prompt: str, reference_images: List[PIL.Image.Image], device: torch.device, weight_dtype: torch.dtype)[source]¶

Encode reference images.

Parameters

prompt (str or list(int)) – prompt to be encoded.
reference_images – (List[Image.Image]): List of reference images.
device (torch.device) – torch device.
weight_dtype (torch.dtype) – torch.dtype.

Returns

text embeddings generated by: clip text encoder.

Return type

text_embeddings (torch.Tensor)

infer(prompt: Union[str, List[str]] = None, height: Optional[int] = None, width: Optional[int] = None, num_inference_steps: int = 50, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, latents: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.FloatTensor] = None, output_type: Optional[str] = 'pil', return_dict: bool = True, callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, callback_steps: int = 1, cross_attention_kwargs: Optional[Dict[str, Any]] = None, alpha_: float = 0.7, reference_subject_images: List[PIL.Image.Image] = None, augmented_prompt_embeds: Optional[torch.FloatTensor] = None, show_progress: bool = True)[source]¶

Function invoked when calling the pipeline for generation.

Parameters

prompt (str or List[str], optional) – The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds. instead.
height (int, optional) – defaults to self.unet.config.sample_size * self.vae_scale_factor The height in pixels of the generated image.
width (int, optional) – defaults to self.unet.config.sample_size * self.vae_scale_factor The width in pixels of the generated image.
num_inference_steps (int, optional, defaults to 50) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float, optional, defaults to 7.5) – Guidance scale as defined in [Classifier-Free Diffusion Guidance] (https://arxiv.org/abs/2207.12598). guidance_scale is defined as w of equation 2. of [Imagen Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting guidance_scale > 1. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.
negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
num_images_per_prompt (int, optional, defaults to 1) – The number of images to generate per prompt.
eta (float, optional, defaults to 0.0) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to [schedulers.DDIMScheduler], will be ignored for others.
List[torch.Generator] (generator (torch.Generator or) –

optional):
One or a list of [torch generator(s)](https://pytorch.org/ docs/stable/generated/torch.Generator.html) to make generation deterministic.

:param :

optional):: One or a list of [torch generator(s)](https://pytorch.org/ docs/stable/generated/torch.Generator.html) to make generation deterministic.

Parameters

latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator.
prompt_embeds (torch.FloatTensor, optional) – Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.
negative_prompt_embeds (torch.FloatTensor, optional) – Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument.
output_type (str, optional, defaults to “pil”) – The output format of the generate image. Choose between [PIL](https://pillow.readthedocs.io/en/stable/): PIL.Image.Image or np.array.
return_dict (bool, optional, defaults to True) – Whether or not to return a [~pipelines.stable_diffusion.StableDiffusionPipelineOutput] instead of a plain tuple.
callback (Callable, optional) – A function that will be called every callback_steps steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor).
callback_steps (int, optional, defaults to 1) – The frequency at which the callback function will be called. If not specified, the callback will be called at every step.
cross_attention_kwargs (dict, optional) – A kwargs dictionary that if specified is passed along to the AttentionProcessor as defined under self.processor in [diffusers.cross_attention](https://github.com/huggingface/ diffusers/blob/main/src/diffusers/models/cross_attention.py).
alpha (float, defaults to 0.7) – The ratio of subject conditioning. If alpha_ is 0.7, the beginning 30% of denoising steps use text prompts, while the last 70% utilize image-augmented prompts. Increase alpha for identity preservation, decrease it for prompt consistency.
reference_subject_images (List[PIL.Image.Image]) – a list of PIL images that are used as reference subjects. The number of images should be equal to the number of augmented tokens in the prompts.
augmented_prompt_embeds – (torch.FloatTensor, optional): Pre-generated image augmented text embeddings. If not provided, embeddings will be generated from prompt and reference_subject_images.
show_progress – (‘bool’): show progress or not.

Examples:

Returns: OrderedDict if return_dict is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of `bool`s denoting whether the corresponding generated image likely represents “not-safe-for-work” (nsfw) content, according to the `safety_checker.
Return type: OrderedDict or tuple

mmagic.models.editors.fastcomposer.fastcomposer¶

Module Contents¶

Classes¶

Attributes¶

`mmagic.models.editors.fastcomposer.fastcomposer`¶