Shortcuts

mmagic.models.editors.disco_diffusion

Package Contents

Classes

ClipWrapper

Clip Models wrapper.

DiscoDiffusion

Disco Diffusion (DD) is a Google Colab Notebook which leverages an AI

ImageTextGuider

Disco-Diffusion uses text and images to guide image generation. We will

SecondaryDiffusionImageNet2

A smaller secondary diffusion model trained by Katherine Crowson to

Functions

alpha_sigma_to_t(alpha, sigma)

convert alpha&sigma to timestep.

class mmagic.models.editors.disco_diffusion.ClipWrapper(clip_type, *args, **kwargs)[source]

Bases: torch.nn.Module

Clip Models wrapper.

We provide wrappers for the clip models of openai and mlfoundations, where the user can specify clip_type as clip or open_clip, and then initialize a clip model using the same arguments as in the original codebase. The following clip models settings are provided in the official repo of disco diffusion: | Setting | Source | Arguments | # noqa |:-----------------------------:|———–|--------------------------------------------------------------| # noqa | ViTB32 | clip | name=’ViT-B/32’, jit=False | # noqa | ViTB16 | clip | name=’ViT-B/16’, jit=False | # noqa | ViTL14 | clip | name=’ViT-L/14’, jit=False | # noqa | ViTL14_336px | clip | name=’ViT-L/14@336px’, jit=False | # noqa | RN50 | clip | name=’RN50’, jit=False | # noqa | RN50x4 | clip | name=’RN50x4’, jit=False | # noqa | RN50x16 | clip | name=’RN50x16’, jit=False | # noqa | RN50x64 | clip | name=’RN50x64’, jit=False | # noqa | RN101 | clip | name=’RN101’, jit=False | # noqa | ViTB32_laion2b_e16 | open_clip | name=’ViT-B-32’, pretrained=’laion2b_e16’ | # noqa | ViTB32_laion400m_e31 | open_clip | model_name=’ViT-B-32’, pretrained=’laion400m_e31’ | # noqa | ViTB32_laion400m_32 | open_clip | model_name=’ViT-B-32’, pretrained=’laion400m_e32’ | # noqa | ViTB32quickgelu_laion400m_e31 | open_clip | model_name=’ViT-B-32-quickgelu’, pretrained=’laion400m_e31’ | # noqa | ViTB32quickgelu_laion400m_e32 | open_clip | model_name=’ViT-B-32-quickgelu’, pretrained=’laion400m_e32’ | # noqa | ViTB16_laion400m_e31 | open_clip | model_name=’ViT-B-16’, pretrained=’laion400m_e31’ | # noqa | ViTB16_laion400m_e32 | open_clip | model_name=’ViT-B-16’, pretrained=’laion400m_e32’ | # noqa | RN50_yffcc15m | open_clip | model_name=’RN50’, pretrained=’yfcc15m’ | # noqa | RN50_cc12m | open_clip | model_name=’RN50’, pretrained=’cc12m’ | # noqa | RN50_quickgelu_yfcc15m | open_clip | model_name=’RN50-quickgelu’, pretrained=’yfcc15m’ | # noqa | RN50_quickgelu_cc12m | open_clip | model_name=’RN50-quickgelu’, pretrained=’cc12m’ | # noqa | RN101_yfcc15m | open_clip | model_name=’RN101’, pretrained=’yfcc15m’ | # noqa | RN101_quickgelu_yfcc15m | open_clip | model_name=’RN101-quickgelu’, pretrained=’yfcc15m’ | # noqa

An example of a clip_modes_cfg is as follows:

Examples:

>>> # Use OpenAI's CLIP
>>> config = dict(
>>>     type='ClipWrapper',
>>>     clip_type='clip',
>>>     name='ViT-B/32',
>>>     jit=False)
>>> # Use OpenCLIP
>>> config = dict(
>>>     type='ClipWrapper',
>>>     clip_type='open_clip',
>>>     model_name='RN50',
>>>     pretrained='yfcc15m')
>>> # Use CLIP from Hugging Face Transformers
>>> config = dict(
>>>     type='ClipWrapper',
>>>     clip_type='huggingface',
>>>     pretrained_model_name_or_path='runwayml/stable-diffusion-v1-5',
>>>     subfolder='text_encoder')
Parameters
  • clip_type (List[Dict]) – The original source of the clip model. Whether be clip, open_clip or hugging_face.

  • *args – Arguments to initialize corresponding clip model.

  • **kwargs

    Arguments to initialize corresponding clip model.

get_embedding_layer()[source]

Function to get embedding layer of the clip model.

Only support for CLIPTextModel currently.

add_embedding(embeddings: Union[dict, List[dict]])[source]
set_only_embedding_trainable()[source]
set_embedding_layer()[source]
unset_embedding_layer()[source]
forward(*args, **kwargs)[source]

Forward function.

class mmagic.models.editors.disco_diffusion.DiscoDiffusion(unet, diffusion_scheduler, secondary_model=None, clip_models=[], use_fp16=False, pretrained_cfgs=None)[source]

Bases: torch.nn.Module

Disco Diffusion (DD) is a Google Colab Notebook which leverages an AI Image generating technique called CLIP-Guided Diffusion to allow you to create compelling and beautiful images from just text inputs. Created by Somnai, augmented by Gandamu, and building on the work of RiversHaveWings, nshepperd, and many others.

Ref:

Github Repo: https://github.com/alembics/disco-diffusion Colab: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb # noqa

Parameters
  • unet (ModelType) – Config of denoising Unet.

  • diffusion_scheduler (ModelType) – Config of diffusion_scheduler scheduler.

  • secondary_model (ModelType) – A smaller secondary diffusion model trained by Katherine Crowson to remove noise from intermediate timesteps to prepare them for CLIP. Ref: https://twitter.com/rivershavewings/status/1462859669454536711 # noqa Defaults to None.

  • clip_models (list) – Config of clip models. Defaults to [].

  • use_fp16 (bool) – Whether to use fp16 for unet model. Defaults to False.

  • pretrained_cfgs (dict) – Path Config for pretrained weights. Usually this is a dict contains module name and the corresponding ckpt path. Defaults to None.

property device

Get current device of the model.

Returns

The current device of the model.

Return type

torch.device

load_pretrained_models(pretrained_cfgs)[source]

Loading pretrained weights to model. pretrained_cfgs is a dict consist of module name as key and checkpoint path as value.

Parameters
  • pretrained_cfgs (dict) – Path Config for pretrained weights.

  • the (Usually this is a dict contains module name and) –

  • None. (corresponding ckpt path. Defaults to) –

infer(scheduler_kwargs=None, height=None, width=None, init_image=None, batch_size=1, num_inference_steps=100, skip_steps=0, show_progress=True, text_prompts=[], image_prompts=[], eta=0.8, clip_guidance_scale=5000, init_scale=1000, tv_scale=0.0, sat_scale=0.0, range_scale=150, cut_overview=[12] * 400 + [4] * 600, cut_innercut=[4] * 400 + [12] * 600, cut_ic_pow=[1] * 1000, cut_icgray_p=[0.2] * 400 + [0] * 600, cutn_batches=4, seed=None)[source]

Inference API for disco diffusion.

Parameters
  • scheduler_kwargs (dict) – Args for infer time diffusion scheduler. Defaults to None.

  • height (int) – Height of output image. Defaults to None.

  • width (int) – Width of output image. Defaults to None.

  • init_image (str) – Initial image at the start point of denoising. Defaults to None.

  • batch_size (int) – Batch size. Defaults to 1.

  • num_inference_steps (int) – Number of inference steps. Defaults to 1000.

  • skip_steps (int) – Denoising steps to skip, usually set with init_image. Defaults to 0.

  • show_progress (bool) – Whether to show progress. Defaults to False.

  • text_prompts (list) – Text prompts. Defaults to [].

  • image_prompts (list) – Image prompts, this is not the same as init_image, they works the same way with text_prompts. Defaults to [].

  • eta (float) – Eta for ddim sampling. Defaults to 0.8.

  • clip_guidance_scale (int) – The Scale of influence of prompts on output image. Defaults to 1000.

  • seed (int) – Sampling seed. Defaults to None.

class mmagic.models.editors.disco_diffusion.ImageTextGuider(clip_models)[source]

Bases: torch.nn.Module

Disco-Diffusion uses text and images to guide image generation. We will use the clip models to extract text and image features as prompts, and then during the iteration, the features of the image patches are computed, and the similarity loss between the prompts features and the generated features is computed. Other losses also include RGB Range loss, total variation loss. Using these losses we can guide the image generation towards the desired target.

Parameters

clip_models (List[Dict]) – List of clip model settings.

property device

Get current device of the model.

Returns

The current device of the model.

Return type

torch.device

frame_prompt_from_text(text_prompts, frame_num=0)[source]

Get current frame prompt.

compute_prompt_stats(text_prompts=[], image_prompt=None, fuzzy_prompt=False, rand_mag=0.05)[source]

Compute prompts statistics.

Parameters
  • text_prompts (list) – Text prompts. Defaults to [].

  • image_prompt (list) – Image prompts. Defaults to None.

  • fuzzy_prompt (bool, optional) – Controls whether to add multiple noisy prompts to the prompt losses. If True, can increase variability of image output. Defaults to False.

  • rand_mag (float, optional) – Controls the magnitude of the random noise added by fuzzy_prompt. Defaults to 0.05.

cond_fn(model, diffusion_scheduler, x, t, beta_prod_t, model_stats, secondary_model=None, init_image=None, clamp_grad=True, clamp_max=0.05, clip_guidance_scale=5000, init_scale=1000, tv_scale=0.0, sat_scale=0.0, range_scale=150, cut_overview=[12] * 400 + [4] * 600, cut_innercut=[4] * 400 + [12] * 600, cut_ic_pow=[1] * 1000, cut_icgray_p=[0.2] * 400 + [0] * 600, cutn_batches=4)[source]

Clip guidance function.

Parameters
  • model (nn.Module) – _description_

  • diffusion_scheduler (object) – _description_

  • x (torch.Tensor) – _description_

  • t (int) – _description_

  • beta_prod_t (torch.Tensor) – _description_

  • model_stats (List[torch.Tensor]) – _description_

  • secondary_model (nn.Module) – A smaller secondary diffusion model trained by Katherine Crowson to remove noise from intermediate timesteps to prepare them for CLIP. Ref: https://twitter.com/rivershavewings/status/1462859669454536711 # noqa Defaults to None.

  • init_image (torch.Tensor) – Initial image for denoising. Defaults to None.

  • clamp_grad (bool, optional) – Whether clamp gradient. Defaults to True.

  • clamp_max (float, optional) – Clamp max values. Defaults to 0.05.

  • clip_guidance_scale (int, optional) – The scale of influence of clip guidance on image generation. Defaults to 5000.

abstract forward(x)[source]

forward function.

class mmagic.models.editors.disco_diffusion.SecondaryDiffusionImageNet2[source]

Bases: torch.nn.Module

A smaller secondary diffusion model trained by Katherine Crowson to remove noise from intermediate timesteps to prepare them for CLIP.

Ref: https://twitter.com/rivershavewings/status/1462859669454536711 # noqa

forward(input, t)[source]

Forward function.

mmagic.models.editors.disco_diffusion.alpha_sigma_to_t(alpha, sigma)[source]

convert alpha&sigma to timestep.

Read the Docs v: latest
Versions
latest
stable
0.x
Downloads
pdf
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.