`mmagic.models.editors.animatediff.animatediff`¶

Module Contents¶

Classes¶

AnimateDiff

Implementation of `AnimateDiff.

Attributes¶

`logger`
`ModelType`

mmagic.models.editors.animatediff.animatediff.logger[源代码]¶

mmagic.models.editors.animatediff.animatediff.ModelType[源代码]¶

class mmagic.models.editors.animatediff.animatediff.AnimateDiff(vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, dtype: str = 'fp32', enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor=dict(type='DataPreprocessor'), motion_module_cfg: Optional[dict] = None, dream_booth_lora_cfg: Optional[dict] = None)[源代码]¶

Bases: mmengine.model.BaseModel

Implementation of `AnimateDiff.

<https://arxiv.org/abs/2307.04725>`_ (AnimateDiff).

参数

vae (Union[dict, nn.Module]) – The config or module for VAE model.
text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.
tokenizer (str) – The name for CLIP tokenizer.
unet (Union[dict, nn.Module]) – The config or module for Unet model.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
lora_config (dict, optional) – The config for LoRA finetuning. Defaults to None.
val_prompts (Union[str, List[str]], optional) – The prompts for validation. Defaults to None.
class_prior_prompt (str, optional) – The prompt for class prior loss.
num_class_images (int, optional) – The number of images for class prior. Defaults to 3.
prior_loss_weight (float, optional) – The weight for class prior loss. Defaults to 0.
fine_tune_text_encoder (bool, optional) – Whether to fine-tune text encoder. Defaults to False.
dtype (str, optional) – The dtype for the model. Defaults to ‘fp16’.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise # noqa Defaults to 0.
tomesd_cfg (dict, optional) – The config for TOMESD. Please refers to https://github.com/dbolya/tomesd and https://github.com/open-mmlab/mmagic/blob/main/mmagic/models/utils/tome_utils.py for detail. # noqa Defaults to None.
data_preprocessor (dict, optional) –
The pre-process config of BaseDataPreprocessor. Defaults to

dict(type=’DataPreprocessor’).
init_cfg (dict, optional) – The weight initialized config for BaseModule. Defaults to None/

property device[源代码]¶: Set device for the model.

set_xformers(module: Optional[torch.nn.Module] = None) → torch.nn.Module[源代码]¶

Set xformers for the model.

返回: The model with xformers.
返回类型: nn.Module

set_tomesd() → torch.nn.Module[源代码]¶

Set ToMe for the stable diffusion model.

返回: The model with ToMe.
返回类型: nn.Module

init_motion_module(motion_module_cfg)[源代码]¶

init_dreambooth_lora(dream_booth_lora_cfg)[源代码]¶

_encode_prompt(prompt, device, num_videos_per_prompt, do_classifier_free_guidance, negative_prompt)[源代码]¶: Encodes the prompt into text encoder hidden states.

decode_latents(latents)[源代码]¶: latents decoder.

prepare_extra_step_kwargs(generator, eta)[源代码]¶: Prepare extra kwargs for the scheduler step, since not all schedulers have the same signature eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.

check_inputs(prompt, height, width)[源代码]¶

Check inputs.

Raise error if not correct

convert_lora(state_dict, LORA_PREFIX_UNET='lora_unet', LORA_PREFIX_TEXT_ENCODER='lora_te', alpha=0.6)[源代码]¶

Convert lora for unet and text_encoder: TODO: use this function to convert lora

参数

state_dict (_type_) – _description_
LORA_PREFIX_UNET (str, optional) –
'lora_unet'. (_description_. Defaults to) –
LORA_PREFIX_TEXT_ENCODER (str, optional) –
'lora_te'. (_description_. Defaults to) –
alpha (float, optional) – _description_. Defaults to 0.6.

返回

check each output type _type_: unet && text_encoder

返回类型

TODO

prepare_latents(batch_size, num_channels_latents, video_length, height, width, dtype, device, generator, latents=None)[源代码]¶: Prepare latent variables.

prepare_model()[源代码]¶

Prepare model for training.

Move model to target dtype and disable gradient for some models.

set_lora()[源代码]¶: Set LORA for model.

val_step(data: dict) → mmagic.utils.typing.SampleList[源代码]¶

Gets the generated image of given data. Calls self.data_preprocessor and self.infer in order. Return the generated results which will be passed to evaluator or visualizer.

参数: data (dict or tuple or list) – Data sampled from dataset.
返回: Generated image or image dict.
返回类型: SampleList

test_step(data: dict) → mmagic.utils.typing.SampleList[源代码]¶

Gets the generated image of given data. Calls self.data_preprocessor and self.infer in order. Return the generated results which will be passed to evaluator or visualizer.

参数: data (dict or tuple or list) – Data sampled from dataset.
返回: Generated image or image dict.
返回类型: SampleList

infer(prompt: Union[str, List[str]], video_length: Optional[int] = 16, height: Optional[int] = None, width: Optional[int] = None, num_inference_steps: int = 50, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_videos_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, latents: Optional[torch.FloatTensor] = None, return_type: Optional[str] = 'tensor', show_progress: bool = True, seed: Optional[int] = 1007)[源代码]¶

Function invoked when calling the pipeline for generation.

参数

prompt (str or List[str]) – The prompt or prompts to guide the video generation.
video_length (int, Option) – The number of frames of the generated video. Defaults to 16.
height (int, Optional) – The height in pixels of the generated image. If not passed, the height will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
width (int, Optional) – The width in pixels of the generated image. If not passed, the width will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
num_inference_steps (int) – The number of denoising steps. More denoising steps usually lead to a higher quality video at the expense of slower inference. Defaults to 50.
guidance_scale (float) – Guidance scale as defined in Classifier- Free Diffusion Guidance (https://arxiv.org/abs/2207.12598). Defaults to 7.5
negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the video generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1). Defaults to None.
num_videos_per_prompt (int) – The number of videos to generate per prompt. Defaults to 1.
eta (float) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to DDIMScheduler, will be ignored for others. Defaults to 0.0.
generator (torch.Generator, optional) – A torch generator to make generation deterministic. Defaults to None.
latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for video generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator. Defaults to None.
return_type (str) – The return type of the inference results. Supported types are ‘video’, ‘numpy’, ‘tensor’. If ‘video’ is passed, a list of PIL images will be returned. If ‘numpy’ is passed, a numpy array with shape [N, C, H, W] will be returned, and the value range will be same as decoder’s output range. If ‘tensor’ is passed, the decoder’s output will be returned. Defaults to ‘image’.

#TODO :returns: A dict containing the generated video :rtype: dict

abstract forward(inputs: torch.Tensor, data_samples: Optional[list] = None, mode: str = 'tensor') → Union[Dict[str, torch.Tensor], list][源代码]¶: forward is not implemented now.

mmagic.models.editors.animatediff.animatediff¶

Module Contents¶

Classes¶

Attributes¶

`mmagic.models.editors.animatediff.animatediff`¶