mmagic.models.editors
¶
Package Contents¶
Classes¶
Implementation of `AnimateDiff. |
|
Dilation backbone used in AOT-GAN model. |
|
Encoder-Decoder used in AOT-GAN model. |
|
Inpaintor for AOT-GAN method. |
|
Face id loss model. |
|
BasicVSR model for video super-resolution. |
|
BasicVSR network structure for video super-resolution. |
|
BasicVSR++ network structure. |
|
Implementation of `Large Scale GAN Training for High Fidelity Natural |
|
CAIN model for Video Interpolation. |
|
CAIN network structure. |
|
Implementation of `ControlNet with Stable Diffusion. |
|
CycleGAN model for unpaired image-to-image translation. |
|
Implementation of `Unsupervised Representation Learning with Deep |
|
Denoising Unet. This network receives a diffused image |
|
Base class for all algorithmic models. |
|
Defines the discriminator for DeblurGanv2 with the specified arguments.. |
|
Defines the generator for DeblurGanv2 with the specified arguments.. |
|
Contexture attention module. |
|
Neck with contextual attention module. |
|
Decoder used in DeepFill model. |
|
Encoder used in DeepFill model. |
|
Refiner used in DeepFill model. |
|
Discriminators used in DeepFillv1 model. |
|
Inpaintor for deepfillv1 method. |
|
Two-stage encoder-decoder structure used in DeepFill model. |
|
DIC model for Face Super-Resolution. |
|
DIC network structure for face super-resolution. |
|
Feedback Block of DIC. |
|
Custom feedback block, will be used as the first feedback block. |
|
Feedback block with HeatmapAttention. |
|
LightCNN discriminator with input size 128 x 128. |
|
Conv2d or Linear layer with max feature selector. |
|
Deep Image Matting model. |
|
Clip Models wrapper. |
|
Disco Diffusion (DD) is a Google Colab Notebook which leverages an AI |
|
Implementation of `DreamBooth with Stable Diffusion. |
|
EDSR network structure. |
|
EDVR model for video super-resolution. |
|
EDVR network structure for video super-resolution. |
|
Implementation of `Efficient Geometry-aware 3D Generative Adversarial |
|
Enhanced SRGAN model for single image super-resolution. |
|
Networks consisting of Residual in Residual Dense Block, which is used |
|
Class for Stable Diffusion. Refers to https://github.com/Stability- |
|
Decoder for FBA matting. |
|
ResNet-based encoder for FBA image matting. |
|
FLAVR model for video interpolation. |
|
PyTorch implementation of FLAVR for video frame interpolation. |
|
Guided Contextual Attention image matting model. |
|
Implementation of Geometric GAN. |
|
GLEAN (using StyleGANv2) architecture for super-resolution. |
|
Decoder used in Global&Local model. |
|
Dilation Backbone used in Global&Local model. |
|
Encoder used in Global&Local model. |
|
Encoder-Decoder used in Global&Local model. |
|
Guided diffusion Model. |
|
IconVSR network structure for video super-resolution. |
|
Depthwise index block. |
|
Holistic Index Block. |
|
Indexed upsample module. |
|
IndexNet matting model. |
|
Decoder for IndexNet. |
|
Encoder for IndexNet. |
|
Colorization InstColorization method. |
|
LIIF model for single image super-resolution. |
|
Multilayer perceptrons (MLPs), refiner used in LIIF. |
|
Implementation of Least Squares Generative Adversarial Networks. |
|
MS-PIE StyleGAN2. |
|
Positional Encoding in SinGAN. |
|
The original version of Baseline model in "Simple Baseline for Image |
|
The original version of Baseline model in "Simple Baseline for Image |
|
NAFNet. |
|
The original version of NAFNetLocal in "Simple Baseline for Image |
|
Mask convolution module. |
|
Implementation for partial convolution. |
|
Decoder with partial conv. |
|
Encoder with partial conv. |
|
Encoder-Decoder with partial conv module. |
|
Inpaintor for Partial Convolution method. |
|
Progressive Growing Unconditional GAN. |
|
Pix2Pix model for paired image-to-image translation. |
|
Simple decoder from Deep Image Matting. |
|
Simple refiner from Deep Image Matting. |
|
RDN model for single image super-resolution. |
|
RealBasicVSR model for real-world video super-resolution. |
|
RealBasicVSR network structure for real-world video super-resolution. |
|
Real-ESRGAN model for single image super-resolution. |
|
A U-Net discriminator with spectral normalization. |
|
Restormer A PyTorch impl of: `Restormer: Efficient Transformer for High- |
|
Implementation of Self-Attention Generative Adversarial Networks. |
|
SinGAN. |
|
SRCNN network structure for image super resolution. |
|
SRGAN model for single image super-resolution. |
|
A modified VGG discriminator with input size 128 x 128. |
|
Modified SRResNet. |
|
Class for Stable Diffusion. Refers to https://github.com/Stability- |
|
Class for Stable Diffusion. Refers to https://github.com/Stability- |
|
Class for Stable Diffusion XL. Refers to https://github.com/Stability- |
|
Implementation of `A Style-Based Generator Architecture for Generative |
|
Implementation of `Analyzing and Improving the Image Quality of |
|
Implementation of Alias-Free Generative Adversarial Networks. # noqa. |
|
StyleGAN3 Generator. |
|
SwinIR |
|
TDAN model for video super-resolution. |
|
TDAN network structure for video super-resolution. |
|
Implementation of `An Image is Worth One Word: Personalizing Text-to- |
|
PyTorch implementation of TOFlow for video frame interpolation. |
|
PyTorch implementation of TOFlow. |
|
ResNet architecture. |
|
Learnable Texture Extractor. |
|
TTSR model for Reference-based Image Super-Resolution. |
|
Search texture reference by transformer. |
|
A discriminator for TTSR. |
|
TTSR network structure (main-net) for reference-based super-resolution. |
|
Implementation of `ViCo with Stable Diffusion. |
|
Implementation of Improved Training of Wasserstein GANs. |
- class mmagic.models.editors.AnimateDiff(vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, dtype: str = 'fp32', enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor=dict(type='DataPreprocessor'), motion_module_cfg: Optional[dict] = None, dream_booth_lora_cfg: Optional[dict] = None)¶
Bases:
mmengine.model.BaseModel
Implementation of `AnimateDiff.
<https://arxiv.org/abs/2307.04725>`_ (AnimateDiff).
- 参数
vae (Union[dict, nn.Module]) – The config or module for VAE model.
text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.
tokenizer (str) – The name for CLIP tokenizer.
unet (Union[dict, nn.Module]) – The config or module for Unet model.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
lora_config (dict, optional) – The config for LoRA finetuning. Defaults to None.
val_prompts (Union[str, List[str]], optional) – The prompts for validation. Defaults to None.
class_prior_prompt (str, optional) – The prompt for class prior loss.
num_class_images (int, optional) – The number of images for class prior. Defaults to 3.
prior_loss_weight (float, optional) – The weight for class prior loss. Defaults to 0.
fine_tune_text_encoder (bool, optional) – Whether to fine-tune text encoder. Defaults to False.
dtype (str, optional) – The dtype for the model. Defaults to ‘fp16’.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise # noqa Defaults to 0.
tomesd_cfg (dict, optional) – The config for TOMESD. Please refers to https://github.com/dbolya/tomesd and https://github.com/open-mmlab/mmagic/blob/main/mmagic/models/utils/tome_utils.py for detail. # noqa Defaults to None.
data_preprocessor (dict, optional) –
The pre-process config of
BaseDataPreprocessor
. Defaults todict(type=’DataPreprocessor’).
init_cfg (dict, optional) – The weight initialized config for
BaseModule
. Defaults to None/
- property device¶
Set device for the model.
- set_xformers(module: Optional[torch.nn.Module] = None) torch.nn.Module ¶
Set xformers for the model.
- 返回
The model with xformers.
- 返回类型
nn.Module
- set_tomesd() torch.nn.Module ¶
Set ToMe for the stable diffusion model.
- 返回
The model with ToMe.
- 返回类型
nn.Module
- init_motion_module(motion_module_cfg)¶
- init_dreambooth_lora(dream_booth_lora_cfg)¶
- _encode_prompt(prompt, device, num_videos_per_prompt, do_classifier_free_guidance, negative_prompt)¶
Encodes the prompt into text encoder hidden states.
- decode_latents(latents)¶
latents decoder.
- prepare_extra_step_kwargs(generator, eta)¶
Prepare extra kwargs for the scheduler step, since not all schedulers have the same signature eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
- check_inputs(prompt, height, width)¶
Check inputs.
Raise error if not correct
- convert_lora(state_dict, LORA_PREFIX_UNET='lora_unet', LORA_PREFIX_TEXT_ENCODER='lora_te', alpha=0.6)¶
- Convert lora for unet and text_encoder
TODO: use this function to convert lora
- 参数
state_dict (_type_) – _description_
LORA_PREFIX_UNET (str, optional) –
'lora_unet'. (_description_. Defaults to) –
LORA_PREFIX_TEXT_ENCODER (str, optional) –
'lora_te'. (_description_. Defaults to) –
alpha (float, optional) – _description_. Defaults to 0.6.
- 返回
check each output type _type_: unet && text_encoder
- 返回类型
TODO
- prepare_latents(batch_size, num_channels_latents, video_length, height, width, dtype, device, generator, latents=None)¶
Prepare latent variables.
- prepare_model()¶
Prepare model for training.
Move model to target dtype and disable gradient for some models.
- set_lora()¶
Set LORA for model.
- val_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the generated image of given data. Calls
self.data_preprocessor
andself.infer
in order. Return the generated results which will be passed to evaluator or visualizer.- 参数
data (dict or tuple or list) – Data sampled from dataset.
- 返回
Generated image or image dict.
- 返回类型
SampleList
- test_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the generated image of given data. Calls
self.data_preprocessor
andself.infer
in order. Return the generated results which will be passed to evaluator or visualizer.- 参数
data (dict or tuple or list) – Data sampled from dataset.
- 返回
Generated image or image dict.
- 返回类型
SampleList
- infer(prompt: Union[str, List[str]], video_length: Optional[int] = 16, height: Optional[int] = None, width: Optional[int] = None, num_inference_steps: int = 50, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_videos_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, latents: Optional[torch.FloatTensor] = None, return_type: Optional[str] = 'tensor', show_progress: bool = True, seed: Optional[int] = 1007)¶
Function invoked when calling the pipeline for generation.
- 参数
prompt (str or List[str]) – The prompt or prompts to guide the video generation.
video_length (int, Option) – The number of frames of the generated video. Defaults to 16.
height (int, Optional) – The height in pixels of the generated image. If not passed, the height will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
width (int, Optional) – The width in pixels of the generated image. If not passed, the width will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
num_inference_steps (int) – The number of denoising steps. More denoising steps usually lead to a higher quality video at the expense of slower inference. Defaults to 50.
guidance_scale (float) – Guidance scale as defined in Classifier- Free Diffusion Guidance (https://arxiv.org/abs/2207.12598). Defaults to 7.5
negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the video generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1). Defaults to None.
num_videos_per_prompt (int) – The number of videos to generate per prompt. Defaults to 1.
eta (float) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to DDIMScheduler, will be ignored for others. Defaults to 0.0.
generator (torch.Generator, optional) – A torch generator to make generation deterministic. Defaults to None.
latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for video generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator. Defaults to None.
return_type (str) – The return type of the inference results. Supported types are ‘video’, ‘numpy’, ‘tensor’. If ‘video’ is passed, a list of PIL images will be returned. If ‘numpy’ is passed, a numpy array with shape [N, C, H, W] will be returned, and the value range will be same as decoder’s output range. If ‘tensor’ is passed, the decoder’s output will be returned. Defaults to ‘image’.
#TODO :returns: A dict containing the generated video :rtype: dict
- abstract forward(inputs: torch.Tensor, data_samples: Optional[list] = None, mode: str = 'tensor') Union[Dict[str, torch.Tensor], list] ¶
forward is not implemented now.
- class mmagic.models.editors.UNet3DConditionMotionModel(sample_size: Optional[int] = None, in_channels: int = 4, out_channels: int = 4, center_input_sample: bool = False, flip_sin_to_cos: bool = True, freq_shift: int = 0, down_block_types: Tuple[str] = ('CrossAttnDownBlock3D', 'CrossAttnDownBlock3D', 'CrossAttnDownBlock3D', 'DownBlock3D'), mid_block_type: str = 'UNetMidBlock3DCrossAttn', up_block_types: Tuple[str] = ('UpBlock3D', 'CrossAttnUpBlock3D', 'CrossAttnUpBlock3D', 'CrossAttnUpBlock3D'), only_cross_attention: Union[bool, Tuple[bool]] = False, block_out_channels: Tuple[int] = (320, 640, 1280, 1280), layers_per_block: int = 2, downsample_padding: int = 1, mid_block_scale_factor: float = 1, act_fn: str = 'silu', norm_num_groups: int = 32, norm_eps: float = 1e-05, cross_attention_dim: int = 768, attention_head_dim: Union[int, Tuple[int]] = 8, dual_cross_attention: bool = False, use_linear_projection: bool = False, class_embed_type: Optional[str] = None, num_class_embeds: Optional[int] = None, upcast_attention: bool = False, resnet_time_scale_shift: str = 'default', use_inflated_groupnorm=False, use_motion_module=False, motion_module_resolutions=(1, 2, 4, 8), motion_module_mid_block=False, motion_module_decoder_only=False, motion_module_type=None, motion_module_kwargs={}, unet_use_cross_frame_attention=None, unet_use_temporal_attention=None, subfolder=None, from_pretrained=None, unet_addtion_kwargs=None)¶
Bases:
diffusers.models.modeling_utils.ModelMixin
,diffusers.configuration_utils.ConfigMixin
- _supports_gradient_checkpointing = True¶
Implementation of UNet3DConditionMotionModel
- init_weights(subfolder=None, from_pretrained=None)¶
Init weights for models.
We just use the initialization method proposed in the original paper.
- 参数
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
- set_attention_slice(slice_size)¶
Enable sliced attention computation.
When this option is enabled, the attention module will split the input tensor in slices, to compute attention in several steps. This is useful to save some memory in exchange for a small speed decrease.
- 参数
slice_size (str or int or `list(int) – defaults to “auto”): When “auto”, halves the input to the attention heads, so attention will be computed in two steps. If “max”, maximum amount of memory will be saved by running only one slice at a time. If a number is provided, uses as many slices as attention_head_dim // slice_size. In this case, attention_head_dim’ must be a multiple of `slice_size.
- _set_gradient_checkpointing(module, value=False)¶
set gradient checkpoint.
- forward(sample: torch.FloatTensor, timestep: Union[torch.Tensor, float, int], encoder_hidden_states: torch.Tensor, class_labels: Optional[torch.Tensor] = None, attention_mask: Optional[torch.Tensor] = None, return_dict: bool = True) Union[UNet3DConditionOutput, Tuple] ¶
- 参数
sample (torch.FloatTensor) – (batch, channel, height, width)
tensor (noisy inputs) –
timestep (torch.FloatTensor or float or int) –
timesteps ((batch)) –
encoder_hidden_states (torch.FloatTensor) –
(batch –
sequence_length –
states (feature_dim) encoder hidden) –
return_dict (bool, optional, defaults to True) – Whether or not to return a [UNet3DConditionOutput] instead of a plain tuple.
- 返回
[UNet3DConditionOutput] if return_dict is True, otherwise a tuple. When returning a tuple, the first element is the sample tensor.
- 返回类型
[UNet3DConditionOutput] or tuple
- classmethod from_pretrained_2d(pretrained_model_path, subfolder=None, unet_additional_kwargs=None)¶
a class method for initialization.
- class mmagic.models.editors.AOTBlockNeck(in_channels=256, dilation_rates=(1, 2, 4, 8), num_aotblock=8, act_cfg=dict(type='ReLU'), **kwargs)¶
Bases:
mmengine.model.BaseModule
Dilation backbone used in AOT-GAN model.
This implementation follows: Aggregated Contextual Transformations for High-Resolution Image Inpainting
- 参数
in_channels (int, optional) – Channel number of input feature. Default: 256.
dilation_rates (Tuple[int], optional) – The dilation rates used
Default (for AOT block.) – (1, 2, 4, 8).
num_aotblock (int, optional) – Number of AOT blocks. Default: 8.
act_cfg (dict, optional) – Config dict for activation layer, “relu” by default.
kwargs (keyword arguments) –
- forward(x)¶
- class mmagic.models.editors.AOTEncoderDecoder(encoder=dict(type='AOTEncoder'), decoder=dict(type='AOTDecoder'), dilation_neck=dict(type='AOTBlockNeck'))¶
Bases:
mmagic.models.editors.global_local.GLEncoderDecoder
Encoder-Decoder used in AOT-GAN model.
This implementation follows: Aggregated Contextual Transformations for High-Resolution Image Inpainting The architecture of the encoder-decoder is: (conv2d x 3) –> (dilated conv2d x 8) –> (conv2d or deconv2d x 3).
- 参数
encoder (dict) – Config dict to encoder.
decoder (dict) – Config dict to build decoder.
dilation_neck (dict) – Config dict to build dilation neck.
- class mmagic.models.editors.AOTInpaintor(data_preprocessor: Union[dict, mmengine.config.Config], encdec: dict, disc: Optional[dict] = None, loss_gan: Optional[dict] = None, loss_gp: Optional[dict] = None, loss_disc_shift: Optional[dict] = None, loss_composed_percep: Optional[dict] = None, loss_out_percep: bool = False, loss_l1_hole: Optional[dict] = None, loss_l1_valid: Optional[dict] = None, loss_tv: Optional[dict] = None, train_cfg: Optional[dict] = None, test_cfg: Optional[dict] = None, init_cfg: Optional[dict] = None)¶
Bases:
mmagic.models.base_models.OneStageInpaintor
Inpaintor for AOT-GAN method.
This inpaintor is implemented according to the paper: Aggregated Contextual Transformations for High-Resolution Image Inpainting
- forward_train_d(data_batch, is_real, is_disc, mask)¶
Forward function in discriminator training step.
In this function, we compute the prediction for each data batch (real or fake). Meanwhile, the standard gan loss will be computed with several proposed losses for stable training.
- 参数
data_batch (torch.Tensor) – Batch of real data or fake data.
is_real (bool) – If True, the gan loss will regard this batch as real data. Otherwise, the gan loss will regard this batch as fake data.
is_disc (bool) – If True, this function is called in discriminator training step. Otherwise, this function is called in generator training step. This will help us to compute different types of adversarial loss, like LSGAN.
mask (torch.Tensor) – Mask of data.
- 返回
Contains the loss items computed in this function.
- 返回类型
dict
- generator_loss(fake_res, fake_img, gt, mask, masked_img)¶
Forward function in generator training step.
In this function, we mainly compute the loss items for generator with the given (fake_res, fake_img). In general, the fake_res is the direct output of the generator and the fake_img is the composition of direct output and ground-truth image.
- 参数
fake_res (torch.Tensor) – Direct output of the generator.
fake_img (torch.Tensor) – Composition of fake_res and ground-truth image.
gt (torch.Tensor) – Ground-truth image.
mask (torch.Tensor) – Mask image.
masked_img (torch.Tensor) – Composition of mask image and ground-truth image.
- 返回
- Dict contains the results computed within this
function for visualization and dict contains the loss items computed in this function.
- 返回类型
tuple(dict)
- forward_tensor(inputs, data_samples)¶
Forward function in tensor mode.
- 参数
inputs (torch.Tensor) – Input tensor.
data_samples (List[dict]) – List of data sample dict.
- 返回
- Direct output of the generator and composition of fake_res
and ground-truth image.
- 返回类型
tuple
- train_step(data: List[dict], optim_wrapper)¶
Train step function.
In this function, the inpaintor will finish the train step following the pipeline: 1. get fake res/image 2. compute reconstruction losses for generator 3. compute adversarial loss for discriminator 4. optimize generator 5. optimize discriminator
- 参数
data (List[dict]) – Batch of data as input.
optim_wrapper (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).
- 返回
- Dict with loss, information for logger, the number of
samples and results for visualization.
- 返回类型
dict
- class mmagic.models.editors.IDLossModel(ir_se50_weights=None)¶
Bases:
torch.nn.Module
Face id loss model.
- 参数
ir_se50_weights (str, optional) – Url of ir-se50 weights. Defaults to None.
- _ir_se50_url = 'https://download.openxlab.org.cn/models/rangoliu/Arcface-IR-SE50/weight/Arcface-IR-SE50'¶
- extract_feats(x)¶
Extracting face features.
- 参数
x (torch.Tensor) – Image tensor of faces.
- 返回
Face features.
- 返回类型
torch.Tensor
- forward(pred=None, gt=None)¶
Calculate face loss.
- 参数
pred (torch.Tensor, optional) – Predictions of face images. Defaults to None.
gt (torch.Tensor, optional) – Ground truth of face images. Defaults to None.
- 返回
- A tuple contain face similarity loss and
improvement.
- 返回类型
Tuple(float, float)
- class mmagic.models.editors.BasicVSR(generator, pixel_loss, ensemble=None, train_cfg=None, test_cfg=None, init_cfg=None, data_preprocessor=None)¶
Bases:
mmagic.models.BaseEditModel
BasicVSR model for video super-resolution.
Note that this model is used for IconVSR.
- Paper:
BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond, CVPR, 2021
- 参数
generator (dict) – Config for the generator structure.
pixel_loss (dict) – Config for pixel-wise loss.
ensemble (dict) – Config for ensemble. Default: None.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
.data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.
- check_if_mirror_extended(lrs)¶
Check whether the input is a mirror-extended sequence.
If mirror-extended, the i-th (i=0, …, t-1) frame is equal to the (t-1-i)-th frame.
- 参数
lrs (tensor) – Input LR images with shape (n, t, c, h, w)
- forward_train(inputs, data_samples=None, **kwargs)¶
Forward training. Returns dict of losses of training.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.
- 返回
Dict of losses.
- 返回类型
dict
- forward_inference(inputs, data_samples=None, **kwargs)¶
Forward inference. Returns predictions of validation, testing.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.
- 返回
predictions.
- 返回类型
- class mmagic.models.editors.BasicVSRNet(mid_channels=64, num_blocks=30, spynet_pretrained=None)¶
Bases:
mmengine.model.BaseModule
BasicVSR network structure for video super-resolution.
Support only x4 upsampling.
- Paper:
BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond, CVPR, 2021
- 参数
mid_channels (int) – Channel number of the intermediate features. Default: 64.
num_blocks (int) – Number of residual blocks in each propagation branch. Default: 30.
spynet_pretrained (str) – Pre-trained model path of SPyNet. Default: None.
- check_if_mirror_extended(lrs)¶
Check whether the input is a mirror-extended sequence.
If mirror-extended, the i-th (i=0, …, t-1) frame is equal to the (t-1-i)-th frame.
- 参数
lrs (tensor) – Input LR images with shape (n, t, c, h, w)
- compute_flow(lrs)¶
Compute optical flow using SPyNet for feature warping.
Note that if the input is an mirror-extended sequence, ‘flows_forward’ is not needed, since it is equal to ‘flows_backward.flip(1)’.
- 参数
lrs (tensor) – Input LR images with shape (n, t, c, h, w)
- 返回
- Optical flow. ‘flows_forward’ corresponds to the
flows used for forward-time propagation (current to previous). ‘flows_backward’ corresponds to the flows used for backward-time propagation (current to next).
- 返回类型
tuple(Tensor)
- forward(lrs)¶
Forward function for BasicVSR.
- 参数
lrs (Tensor) – Input LR sequence with shape (n, t, c, h, w).
- 返回
Output HR sequence with shape (n, t, c, 4h, 4w).
- 返回类型
Tensor
- class mmagic.models.editors.BasicVSRPlusPlusNet(mid_channels=64, num_blocks=7, max_residue_magnitude=10, is_low_res_input=True, spynet_pretrained=None, cpu_cache_length=100)¶
Bases:
mmengine.model.BaseModule
BasicVSR++ network structure.
Support either x4 upsampling or same size output.
- Paper:
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
- 参数
mid_channels (int, optional) – Channel number of the intermediate features. Default: 64.
num_blocks (int, optional) – The number of residual blocks in each propagation branch. Default: 7.
max_residue_magnitude (int) – The maximum magnitude of the offset residue (Eq. 6 in paper). Default: 10.
is_low_res_input (bool, optional) – Whether the input is low-resolution or not. If False, the output resolution is equal to the input resolution. Default: True.
spynet_pretrained (str, optional) – Pre-trained model path of SPyNet. Default: None.
cpu_cache_length (int, optional) – When the length of sequence is larger than this value, the intermediate features are sent to CPU. This saves GPU memory, but slows down the inference speed. You can increase this number if you have a GPU with large memory. Default: 100.
- check_if_mirror_extended(lqs)¶
Check whether the input is a mirror-extended sequence.
If mirror-extended, the i-th (i=0, …, t-1) frame is equal to the (t-1-i)-th frame.
- 参数
lqs (tensor) – Input low quality (LQ) sequence with shape (n, t, c, h, w).
- compute_flow(lqs)¶
Compute optical flow using SPyNet for feature alignment.
Note that if the input is an mirror-extended sequence, ‘flows_forward’ is not needed, since it is equal to ‘flows_backward.flip(1)’.
- 参数
lqs (tensor) – Input low quality (LQ) sequence with shape (n, t, c, h, w).
- 返回
- Optical flow. ‘flows_forward’ corresponds to the
flows used for forward-time propagation (current to previous). ‘flows_backward’ corresponds to the flows used for backward-time propagation (current to next).
- 返回类型
tuple(Tensor)
- propagate(feats, flows, module_name)¶
Propagate the latent features throughout the sequence.
- 参数
dict (feats) – Features from previous branches. Each component is a list of tensors with shape (n, c, h, w).
flows (tensor) – Optical flows with shape (n, t - 1, 2, h, w).
module_name (str) – The name of the propagation branches. Can either be ‘backward_1’, ‘forward_1’, ‘backward_2’, ‘forward_2’.
- 返回
- A dictionary containing all the propagated
features. Each key in the dictionary corresponds to a propagation branch, which is represented by a list of tensors.
- 返回类型
dict(list[tensor])
- upsample(lqs, feats)¶
Compute the output image given the features.
- 参数
lqs (tensor) – Input low quality (LQ) sequence with shape (n, t, c, h, w).
feats (dict) – The features from the propagation branches.
- 返回
Output HR sequence with shape (n, t, c, 4h, 4w).
- 返回类型
Tensor
- forward(lqs)¶
Forward function for BasicVSR++.
- 参数
lqs (tensor) – Input low quality (LQ) sequence with shape (n, t, c, h, w).
- 返回
Output HR sequence with shape (n, t, c, 4h, 4w).
- 返回类型
Tensor
- class mmagic.models.editors.BigGAN(generator: ModelType, discriminator: Optional[ModelType] = None, data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, noise_size: Optional[int] = None, num_classes: Optional[int] = None, ema_config: Optional[Dict] = None)¶
Bases:
mmagic.models.base_models.BaseConditionalGAN
Implementation of Large Scale GAN Training for High Fidelity Natural Image Synthesis (BigGAN).
Detailed architecture can be found in
BigGANGenerator
andBigGANDiscriminator
- 参数
generator (ModelType) – The config or model of the generator.
discriminator (Optional[ModelType]) – The config or model of the discriminator. Defaults to None.
data_preprocessor (Optional[Union[dict, Config]]) – The pre-process config or
DataPreprocessor
.generator_steps (int) – Number of times the generator was completely updated before the discriminator is updated. Defaults to 1.
discriminator_steps (int) – Number of times the discriminator was completely updated before the generator is updated. Defaults to 1.
noise_size (Optional[int]) – Size of the input noise vector. Default to 128.
num_classes (Optional[int]) – The number classes you would like to generate. Defaults to None.
ema_config (Optional[Dict]) – The config for generator’s exponential moving average setting. Defaults to None.
- disc_loss(disc_pred_fake: torch.Tensor, disc_pred_real: torch.Tensor) Tuple ¶
Get disc loss. BigGAN use hinge loss to train the discriminator.
- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
disc_pred_real (Tensor) – Discriminator’s prediction of the real images.
- 返回
Loss value and a dict of log variables.
- 返回类型
tuple[Tensor, dict]
- gen_loss(disc_pred_fake)¶
Get disc loss. BigGAN use hinge loss to train the generator.
- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
- 返回
Loss value and a dict of log variables.
- 返回类型
tuple[Tensor, dict]
- train_discriminator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train discriminator.
- 参数
inputs (dict) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- train_generator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train generator.
- 参数
inputs (dict) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader. Do not used in generator’s training.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- class mmagic.models.editors.CAIN(generator: dict, pixel_loss: dict, train_cfg: Optional[dict] = None, test_cfg: Optional[dict] = None, required_frames: int = 2, step_frames: int = 1, init_cfg: Optional[dict] = None, data_preprocessor: Optional[dict] = None)¶
Bases:
mmagic.models.base_models.BasicInterpolator
CAIN model for Video Interpolation.
Paper: Channel Attention Is All You Need for Video Frame Interpolation Ref repo: https://github.com/myungsub/CAIN
- 参数
generator (dict) – Config for the generator structure.
pixel_loss (dict) – Config for pixel-wise loss.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
required_frames (int) – Required frames in each process. Default: 2
step_frames (int) – Step size of video frame interpolation. Default: 1
init_cfg (dict, optional) – The weight initialized config for
BaseModule
.data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.
- init_cfg¶
Initialization config dict.
- Type
dict, optional
- data_preprocessor¶
Used for pre-processing data sampled by dataloader to the format accepted by
forward()
.- Type
BaseDataPreprocessor
- forward_inference(inputs, data_samples=None)¶
Forward inference. Returns predictions of validation, testing, and simple inference.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.
- 返回
predictions.
- 返回类型
List[DataSample]
- class mmagic.models.editors.CAINNet(in_channels=3, kernel_size=3, num_block_groups=5, num_block_layers=12, depth=3, reduction=16, norm=None, padding=7, act=nn.LeakyReLU(0.2, True), init_cfg=None)¶
Bases:
mmengine.model.BaseModule
CAIN network structure.
Paper: Channel Attention Is All You Need for Video Frame Interpolation. Ref repo: https://github.com/myungsub/CAIN
- 参数
in_channels (int) – Channel number of inputs. Default: 3.
kernel_size (int) – Kernel size of CAINNet. Default: 3.
num_block_groups (int) – Number of block groups. Default: 5.
num_block_layers (int) – Number of blocks in a group. Default: 12.
depth (int) – Down scale depth, scale = 2**depth. Default: 3.
reduction (int) – Channel reduction of CA. Default: 16.
norm (str | None) – Normalization layer. If it is None, no normalization is performed. Default: None.
padding (int) – Padding of CAINNet. Default: 7.
act (function) – activate function. Default: nn.LeakyReLU(0.2, True).
init_cfg (dict, optional) – Initialization config dict. Default: None.
- forward(imgs, padding_flag=False)¶
Forward function.
- 参数
imgs (Tensor) – Input tensor with shape (n, 2, c, h, w).
padding_flag (bool) – Padding or not. Default: False.
- 返回
Forward results.
- 返回类型
Tensor
- class mmagic.models.editors.ControlStableDiffusion(vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, controlnet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, dtype: str = 'fp32', enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor=dict(type='DataPreprocessor'), init_cfg: Optional[dict] = None, attention_injection=False)¶
Bases:
mmagic.models.editors.stable_diffusion.StableDiffusion
Implementation of `ControlNet with Stable Diffusion.
<https://arxiv.org/abs/2302.05543>`_ (ControlNet).
- 参数
vae (Union[dict, nn.Module]) – The config or module for VAE model.
text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.
tokenizer (str) – The name for CLIP tokenizer.
unet (Union[dict, nn.Module]) – The config or module for Unet model.
controlnet (Union[dict, nn.Module]) – The config or module for ControlNet.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
dtype (str, optional) – The dtype for the model. Defaults to ‘fp16’.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise # noqa Defaults to 0.
data_preprocessor (dict, optional) –
The pre-process config of
BaseDataPreprocessor
. Defaults todict(type=’DataPreprocessor’).
init_cfg (dict, optional) – The weight initialized config for
BaseModule
. Defaults to None/
- init_weights()¶
Initialize the weights. Noted that this function will only be called at train. If you want to inference with a different unet model, you can call this function manually or use mmagic.models.editors.controlnet.controlnet_utils.change_base_model to convert the weight of ControlNet manually.
Example: >>> 1. init controlnet from unet >>> init_cfg = dict(type=’init_from_unet’)
>>> 2. switch controlnet weight from unet >>> # base model is not defined, use `runwayml/stable-diffusion-v1-5` >>> # as default >>> init_cfg = dict(type='convert_from_unet') >>> # base model is defined >>> init_cfg = dict( >>> type='convert_from_unet', >>> base_model=dict( >>> type='UNet2DConditionModel', >>> from_pretrained='REPO_ID', >>> subfolder='unet'))
- train_step(data: dict, optim_wrapper: mmengine.optim.OptimWrapperDict) Dict[str, torch.Tensor] ¶
Train step for ControlNet model. :param data: Data sampled from dataloader. :type data: dict :param optim_wrapper: OptimWrapperDict instance
contains OptimWrapper of generator and discriminator.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, torch.Tensor]
- val_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the generated image of given data. Calls
self.data_preprocessor
andself.infer
in order. Return the generated results which will be passed to evaluator or visualizer.- 参数
data (dict or tuple or list) – Data sampled from dataset.
- 返回
Generated image or image dict.
- 返回类型
SampleList
- test_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the generated image of given data. Calls
self.data_preprocessor
andself.infer
in order. Return the generated results which will be passed to evaluator or visualizer.- 参数
data (dict or tuple or list) – Data sampled from dataset.
- 返回
Generated image or image dict.
- 返回类型
SampleList
- static prepare_control(image: Tuple[PIL.Image.Image, List[PIL.Image.Image], torch.Tensor, List[torch.Tensor]], width: int, height: int, batch_size: int, num_images_per_prompt: int, device: str, dtype: str) torch.Tensor ¶
A helper function to prepare single control images.
- 参数
image (Tuple[Image.Image, List[Image.Image], Tensor, List[Tensor]]) – # noqa The input image for control.
batch_size (int) – The number of the prompt. The control will be repeated for batch_size times.
num_images_per_prompt (int) – The number images generate for one prompt.
device (str) – The device of the control.
dtype (str) – The dtype of the control.
- 返回
The control in torch.tensor.
- 返回类型
Tensor
- train(mode: bool = True)¶
Set train/eval mode.
- 参数
mode (bool, optional) – Whether set train mode. Defaults to True.
- infer(prompt: Union[str, List[str]], height: Optional[int] = None, width: Optional[int] = None, control: Optional[Union[str, numpy.ndarray, torch.Tensor]] = None, controlnet_conditioning_scale: float = 1.0, num_inference_steps: int = 20, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[torch.Generator] = None, latents: Optional[torch.FloatTensor] = None, return_type='image', show_progress=True)¶
Function invoked when calling the pipeline for generation.
- 参数
prompt (str or List[str]) – The prompt or prompts to guide the image generation.
height (int, Optional) – The height in pixels of the generated image. If not passed, the height will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
width (int, Optional) – The width in pixels of the generated image. If not passed, the width will be self.unet_sample_size * self.vae_scale_factor Defaults to None.
num_inference_steps (int) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. Defaults to 50.
guidance_scale (float) – Guidance scale as defined in Classifier- Free Diffusion Guidance (https://arxiv.org/abs/2207.12598). Defaults to 7.5
negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1). Defaults to None.
num_images_per_prompt (int) – The number of images to generate per prompt. Defaults to 1.
eta (float) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to DDIMScheduler, will be ignored for others. Defaults to 0.0.
generator (torch.Generator, optional) – A torch generator to make generation deterministic. Defaults to None.
latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator. Defaults to None.
return_type (str) – The return type of the inference results. Supported types are ‘image’, ‘numpy’, ‘tensor’. If ‘image’ is passed, a list of PIL images will be returned. If ‘numpy’ is passed, a numpy array with shape [N, C, H, W] will be returned, and the value range will be same as decoder’s output range. If ‘tensor’ is passed, the decoder’s output will be returned. Defaults to ‘image’.
- 返回
A dict containing the generated images and Control image.
- 返回类型
dict
- abstract forward(*args, **kwargs)¶
forward is not implemented now.
- class mmagic.models.editors.CycleGAN(*args, buffer_size=50, loss_config=dict(cycle_loss_weight=10.0, id_loss_weight=0.5), **kwargs)¶
Bases:
mmagic.models.base_models.BaseTranslationModel
CycleGAN model for unpaired image-to-image translation.
Ref: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
- forward_test(img, target_domain, **kwargs)¶
Forward function for testing.
- 参数
img (tensor) – Input image tensor.
target_domain (str) – Target domain of output image.
kwargs (dict) – Other arguments.
- 返回
Forward results.
- 返回类型
dict
- _get_disc_loss(outputs)¶
Backward function for the discriminators.
- 参数
outputs (dict) – Dict of forward results.
- 返回
Discriminators’ loss and loss dict.
- 返回类型
dict
- _get_gen_loss(outputs)¶
Backward function for the generators.
- 参数
outputs (dict) – Dict of forward results.
- 返回
Generators’ loss and loss dict.
- 返回类型
dict
- _get_opposite_domain(domain)¶
Get the opposite domain respect to the input domain.
- 参数
domain (str) – The input domain.
- 返回
The opposite domain.
- 返回类型
str
- train_step(data: dict, optim_wrapper: mmengine.optim.OptimWrapperDict)¶
Training step function.
- 参数
data_batch (dict) – Dict of the input data batch.
optimizer (dict[torch.optim.Optimizer]) – Dict of optimizers for the generators and discriminators.
ddp_reducer (
Reducer
| None, optional) – Reducer from ddp. It is used to prepare forbackward()
in ddp. Defaults to None.running_status (dict | None, optional) – Contains necessary basic information for training, e.g., iteration number. Defaults to None.
- 返回
Dict of loss, information for logger, the number of samples and results for visualization.
- 返回类型
dict
- test_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the generated image of given data. Same as
val_step()
.- 参数
data (dict) – Data sampled from metric specific sampler. More details in Metrics and Evaluator.
- 返回
A list of
DataSample
contain generated results.- 返回类型
SampleList
- val_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the generated image of given data. Same as
val_step()
.- 参数
data (dict) – Data sampled from metric specific sampler. More details in Metrics and Evaluator.
- 返回
A list of
DataSample
contain generated results.- 返回类型
SampleList
- class mmagic.models.editors.DCGAN(generator: ModelType, discriminator: Optional[ModelType] = None, data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, noise_size: Optional[int] = None, ema_config: Optional[Dict] = None, loss_config: Optional[Dict] = None)¶
Bases:
mmagic.models.base_models.BaseGAN
Implementation of Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.
- Paper link:
<https://arxiv.org/abs/1511.06434>`_ (DCGAN).
Detailed architecture can be found in
DCGANGenerator
# noqa andDCGANDiscriminator
# noqa- disc_loss(disc_pred_fake: torch.Tensor, disc_pred_real: torch.Tensor) Tuple ¶
Get disc loss. DCGAN use the vanilla gan loss to train the discriminator.
- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
disc_pred_real (Tensor) – Discriminator’s prediction of the real images.
- 返回
Loss value and a dict of log variables.
- 返回类型
tuple[Tensor, dict]
- gen_loss(disc_pred_fake: torch.Tensor) Tuple ¶
Get gen loss. DCGAN use the vanilla gan loss to train the generator.
- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
- 返回
Loss value and a dict of log variables.
- 返回类型
tuple[Tensor, dict]
- train_discriminator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train discriminator.
- 参数
inputs (dict) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- train_generator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train generator.
- 参数
inputs (dict) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader. Do not used in generator’s training.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- class mmagic.models.editors.DenoisingUnet(image_size, in_channels=3, out_channels=None, base_channels=128, resblocks_per_downsample=3, num_timesteps=1000, use_rescale_timesteps=False, dropout=0, embedding_channels=- 1, num_classes=0, use_fp16=False, channels_cfg=None, output_cfg=dict(mean='eps', var='learned_range'), norm_cfg=dict(type='GN', num_groups=32), act_cfg=dict(type='SiLU', inplace=False), shortcut_kernel_size=1, use_scale_shift_norm=False, resblock_updown=False, num_heads=4, time_embedding_mode='sin', time_embedding_cfg=None, resblock_cfg=dict(type='DenoisingResBlock'), attention_cfg=dict(type='MultiHeadAttention'), encoder_channels=None, downsample_conv=True, upsample_conv=True, downsample_cfg=dict(type='DenoisingDownsample'), upsample_cfg=dict(type='DenoisingUpsample'), attention_res=[16, 8], pretrained=None, unet_type='', down_block_types: Tuple[str] = (), up_block_types: Tuple[str] = (), cross_attention_dim=768, layers_per_block: int = 2)¶
Bases:
mmengine.model.BaseModule
Denoising Unet. This network receives a diffused image
x_t
and current timestept
, and returns aoutput_dict
corresponding to the passedoutput_cfg
.output_cfg
defines the number of channels and the meaning of the output.output_cfg
mainly contains keys ofmean
andvar
, denoting how the network outputs mean and variance required for the denoising process. Formean
: 1.dict(mean='EPS')
: Model will predict noise added in thediffusion process, and the
output_dict
will contain a key namedeps_t_pred
.dict(mean='START_X')
: Model will direct predict the mean of theoriginal image x_0, and the
output_dict
will contain a key namedx_0_pred
.
dict(mean='X_TM1_PRED')
: Model will predict the mean of diffusedimage at t-1 timestep, and the
output_dict
will contain a key namedx_tm1_pred
.
For
var
: 1.dict(var='FIXED_SMALL')
ordict(var='FIXED_LARGE')
: Variance inthe denoising process is regarded as a fixed value. Therefore only ‘mean’ will be predicted, and the output channels will equal to the input image (e.g., three channels for RGB image.)
dict(var='LEARNED')
: Model will predict log_variance in thedenoising process, and the
output_dict
will contain a key namedlog_var
.
dict(var='LEARNED_RANGE')
: Model will predict an interpolationfactor and the log_variance will be calculated as factor * upper_bound + (1-factor) * lower_bound. The
output_dict
will contain a key namedfactor
.
If
var
is notFIXED_SMALL
orFIXED_LARGE
, the number of output channels will be the double of input channels, where the first half part contains predicted mean values and the other part is the predicted variance values. Otherwise, the number of output channels equals to the input channels, only containing the predicted mean values.- 参数
image_size (int | list[int]) – The size of image to denoise.
in_channels (int, optional) – The input channels of the input image. Defaults as
3
.out_channels (int, optional) – The output channels of the output prediction. Defaults as
None
for automaticaaly assigned byvar_mode
.base_channels (int, optional) – The basic channel number of the generator. The other layers contain channels based on this number. Defaults to
128
.resblocks_per_downsample (int, optional) – Number of ResBlock used between two downsample operations. The number of ResBlock between upsample operations will be the same value to keep symmetry. Defaults to 3.
num_timesteps (int, optional) – The total timestep of the denoising process and the diffusion process. Defaults to
1000
.use_rescale_timesteps (bool, optional) – Whether rescale the input timesteps in range of [0, 1000]. Defaults to
True
.dropout (float, optional) – The probability of dropout operation of each ResBlock. Pass
0
to do not use dropout. Defaults as 0.embedding_channels (int, optional) – The output channels of time embedding layer and label embedding layer. If not passed (or passed
-1
), output channels of the embedding layers will set as four times ofbase_channels
. Defaults to-1
.num_classes (int, optional) – The number of conditional classes. If set to 0, this model will be degraded to an unconditional model. Defaults to 0.
channels_cfg (list | dict[list], optional) – Config for input channels of the intermediate blocks. If list is passed, each element of the list indicates the scale factor for the input channels of the current block with regard to the
base_channels
. For blocki
, the input and output channels should bechannels_cfg[i] * base_channels
andchannels_cfg[i+1] * base_channels
If dict is provided, the key of the dict should be the output scale and corresponding value should be a list to define channels. Default: Please refer to_default_channels_cfg
.output_cfg (dict, optional) – Config for output variables. Defaults to
dict(mean='eps', var='learned_range')
.norm_cfg (dict, optional) – The config for normalization layers. Defaults to
dict(type='GN', num_groups=32)
.act_cfg (dict, optional) – The config for activation layers. Defaults to
dict(type='SiLU', inplace=False)
.shortcut_kernel_size (int, optional) – The kernel size for shortcut conv in ResBlocks. The value of this argument will overwrite the default value of resblock_cfg. Defaults to 3.
use_scale_shift_norm (bool, optional) – Whether perform scale and shift after normalization operation. Defaults to True.
num_heads (int, optional) – The number of attention heads. Defaults to 4.
time_embedding_mode (str, optional) – Embedding method of
time_embedding
. Defaults to ‘sin’.time_embedding_cfg (dict, optional) – Config for
time_embedding
. Defaults to None.resblock_cfg (dict, optional) – Config for ResBlock. Defaults to
dict(type='DenoisingResBlock')
.attention_cfg (dict, optional) – Config for attention operation. Defaults to
dict(type='MultiHeadAttention')
.upsample_conv (bool, optional) – Whether use conv in upsample block. Defaults to
True
.downsample_conv (bool, optional) – Whether use conv operation in downsample block. Defaults to
True
.upsample_cfg (dict, optional) – Config for upsample blocks. Defaults to
dict(type='DenoisingDownsample')
.downsample_cfg (dict, optional) – Config for downsample blocks. Defaults to
dict(type='DenoisingUpsample')
.attention_res (int | list[int], optional) – Resolution of feature maps to apply attention operation. Defaults to
[16, 8]
.pretrained (str | dict, optional) – Path for the pretrained model or dict containing information for pretrained models whose necessary key is ‘ckpt_path’. Besides, you can also provide ‘prefix’ to load the generator part from the whole state dict. Defaults to None.
- _default_channels_cfg¶
- forward(x_t, t, encoder_hidden_states=None, label=None, return_noise=False)¶
Forward function. :param x_t: Diffused image at timestep t to denoise. :type x_t: torch.Tensor :param t: Current timestep. :type t: torch.Tensor :param label: You can directly give a
batch of label through a
torch.Tensor
or offer a callable function to sample a batch of label data. Otherwise, theNone
indicates to use the default label sampler.- 参数
return_noise (bool, optional) – If True, inputted
x_t
andt
will be returned in a dict with output desired byoutput_cfg
. Defaults to False.- 返回
If not
return_noise
- 返回类型
torch.Tensor | dict
- init_weights(pretrained=None)¶
Init weights for models.
We just use the initialization method proposed in the original paper.
- 参数
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
- convert_to_fp16()¶
Convert the precision of the model to float16.
- convert_to_fp32()¶
Convert the precision of the model to float32.
- class mmagic.models.editors.DeblurGanV2(generator: ModelType, discriminator: Optional[ModelType] = None, pixel_loss: Optional[Union[dict, str]] = None, disc_loss: Optional[Union[dict, str]] = None, adv_lambda: float = 0.001, warmup_num: int = 3, train_cfg: Optional[dict] = None, test_cfg: Optional[dict] = None, init_cfg: Optional[dict] = None, data_preprocessor: Optional[dict] = None)¶
Bases:
mmengine.model.BaseModel
Base class for all algorithmic models.
BaseModel implements the basic functions of the algorithmic model, such as weights initialize, batch inputs preprocess(see more information in
BaseDataPreprocessor
), parse losses, and update model parameters.Subclasses inherit from BaseModel only need to implement the forward method, which implements the logic to calculate loss and predictions, then can be trained in the runner.
实际案例
>>> @MODELS.register_module() >>> class ToyModel(BaseModel): >>> >>> def __init__(self): >>> super().__init__() >>> self.backbone = nn.Sequential() >>> self.backbone.add_module('conv1', nn.Conv2d(3, 6, 5)) >>> self.backbone.add_module('pool', nn.MaxPool2d(2, 2)) >>> self.backbone.add_module('conv2', nn.Conv2d(6, 16, 5)) >>> self.backbone.add_module('fc1', nn.Linear(16 * 5 * 5, 120)) >>> self.backbone.add_module('fc2', nn.Linear(120, 84)) >>> self.backbone.add_module('fc3', nn.Linear(84, 10)) >>> >>> self.criterion = nn.CrossEntropyLoss() >>> >>> def forward(self, batch_inputs, data_samples, mode='tensor'): >>> data_samples = torch.stack(data_samples) >>> if mode == 'tensor': >>> return self.backbone(batch_inputs) >>> elif mode == 'predict': >>> feats = self.backbone(batch_inputs) >>> predictions = torch.argmax(feats, 1) >>> return predictions >>> elif mode == 'loss': >>> feats = self.backbone(batch_inputs) >>> loss = self.criterion(feats, data_samples) >>> return dict(loss=loss)
- 参数
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.init_cfg (dict, optional) – The weight initialized config for
BaseModule
.
- data_preprocessor¶
Used for pre-processing data sampled by dataloader to the format accepted by
forward()
.- Type
BaseDataPreprocessor
- init_cfg¶
Initialization config dict.
- Type
dict, optional
- forward(inputs: torch.Tensor, data_samples: Optional[List[mmagic.structures.DataSample]] = None, mode: str = 'tensor', **kwargs) Union[torch.Tensor, List[mmagic.structures.DataSample], dict] ¶
Returns losses or predictions of training, validation, testing, and simple inference process.
forward
method of BaseModel is an abstract method, its subclasses must implement this method.Accepts
inputs
anddata_samples
processed bydata_preprocessor
, and returns results according to mode arguments.During non-distributed training, validation, and testing process,
forward
will be called byBaseModel.train_step
,BaseModel.val_step
andBaseModel.val_step
directly.During distributed data parallel training process,
MMSeparateDistributedDataParallel.train_step
will first callDistributedDataParallel.forward
to enable automatic gradient synchronization, and then callforward
to get training loss.- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.mode (str) –
mode should be one of
loss
,predict
andtensor
. Default: ‘tensor’.loss
: Called bytrain_step
and return lossdict
used for loggingpredict
: Called byval_step
andtest_step
and return list ofBaseDataElement
results used for computing metric.tensor
: Called by custom use to getTensor
type results.
- 返回
If
mode == loss
, return adict
of loss tensor used for backward and logging.If
mode == val
, return alist
ofBaseDataElement
for computing metric and getting inference result.If
mode == predict
, return alist
ofBaseDataElement
for computing metric and getting inference result.If
mode == tensor
, return a tensor ortuple
of tensor ordict
or tensor for custom use.
- 返回类型
ForwardResults
- convert_to_datasample(predictions: mmagic.structures.DataSample, data_samples: mmagic.structures.DataSample, inputs: Optional[torch.Tensor]) List[mmagic.structures.DataSample] ¶
Add predictions and destructed inputs (if passed) to data samples.
- 参数
predictions (DataSample) – The predictions of the model.
data_samples (DataSample) – The data samples loaded from dataloader.
inputs (Optional[torch.Tensor]) – The input of model. Defaults to None.
- 返回
Modified data samples.
- 返回类型
List[DataSample]
- forward_tensor(inputs: torch.Tensor, data_samples: Optional[List[mmagic.structures.DataSample]] = None, **kwargs) torch.Tensor ¶
Forward tensor. Returns result of simple forward.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.
- 返回
result of simple forward.
- 返回类型
Tensor
- forward_inference(inputs: torch.Tensor, data_samples: Optional[List[mmagic.structures.DataSample]] = None, **kwargs) List[mmagic.structures.DataSample] ¶
Forward inference. Returns predictions of validation, testing, and simple inference.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.
- 返回
predictions.
- 返回类型
List[EditDataSample]
- forward_train(inputs, data_samples=None, **kwargs)¶
Forward training. Losses of training is calculated in train_step.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.
- 返回
Result of
forward_tensor
withtraining=True
.- 返回类型
Tensor
- val_step(data: Union[tuple, dict, list]) list ¶
Gets the predictions of given data.
Calls
self.data_preprocessor(data, False)
andself(inputs, data_sample, mode='predict')
in order. Return the predictions which will be passed to evaluator.- 参数
data (dict or tuple or list) – Data sampled from dataset.
- 返回
The predictions of given data.
- 返回类型
list
- test_step(data: Union[dict, tuple, list]) list ¶
BaseModel
implementstest_step
the same asval_step
.- 参数
data (dict or tuple or list) – Data sampled from dataset.
- 返回
The predictions of given data.
- 返回类型
list
- _run_forward(data: Union[dict, tuple, list], mode: str) Union[Dict[str, torch.Tensor], list] ¶
Unpacks data for
forward()
- 参数
data (dict or tuple or list) – Data sampled from dataset.
mode (str) – Mode of forward.
- 返回
Results of training or testing mode.
- 返回类型
dict or list
- train_step(data: List[dict], optim_wrapper: mmengine.optim.OptimWrapperDict) Dict[str, torch.Tensor] ¶
Train step of GAN-based method.
- 参数
data (List[dict]) – Data sampled from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, torch.Tensor]
- g_step(batch_outputs: torch.Tensor, batch_gt_data: torch.Tensor)¶
G step of DobuleGAN: Calculate losses of generator.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
- 返回
Dict of losses.
- 返回类型
dict
- d_step(batch_outputs: torch.Tensor, batch_gt_data: torch.Tensor)¶
D step of DobuleGAN: Calculate losses of generator.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
- 返回
Dict of losses.
- 返回类型
dict
- g_step_with_optim(batch_outputs: torch.Tensor, batch_gt_data: torch.Tensor, optim_wrapper: mmengine.optim.OptimWrapperDict)¶
G step with optim of GAN: Calculate losses of generator and run optim.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
optim_wrapper (OptimWrapperDict) – Optim wrapper dict.
- 返回
Dict of parsed losses.
- 返回类型
dict
- d_step_with_optim(batch_outputs: torch.Tensor, batch_gt_data: torch.Tensor, optim_wrapper: mmengine.optim.OptimWrapperDict)¶
D step with optim of GAN: Calculate losses of discriminator and run optim.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
optim_wrapper (OptimWrapperDict) – Optim wrapper dict.
- 返回
Dict of parsed losses.
- 返回类型
dict
- extract_gt_data(data_samples)¶
extract gt data from data samples.
- 参数
data_samples (list) – List of DataSample.
- 返回
Extract gt data.
- 返回类型
Tensor
- class mmagic.models.editors.DeblurGanV2Discriminator¶
Defines the discriminator for DeblurGanv2 with the specified arguments..
- 参数
model (Str) – Type of the discriminator model
- class mmagic.models.editors.DeblurGanV2Generator¶
Defines the generator for DeblurGanv2 with the specified arguments..
- 参数
model (Str) – Type of the generator model
- class mmagic.models.editors.ContextualAttentionModule(unfold_raw_kernel_size=4, unfold_raw_stride=2, unfold_raw_padding=1, unfold_corr_kernel_size=3, unfold_corr_stride=1, unfold_corr_dilation=1, unfold_corr_padding=1, scale=0.5, fuse_kernel_size=3, softmax_scale=10, return_attention_score=True)¶
Bases:
mmengine.model.BaseModule
Contexture attention module.
The details of this module can be found in: Generative Image Inpainting with Contextual Attention
- 参数
unfold_raw_kernel_size (int) – Kernel size used in unfolding raw feature. Default: 4.
unfold_raw_stride (int) – Stride used in unfolding raw feature. Default: 2.
unfold_raw_padding (int) – Padding used in unfolding raw feature. Default: 1.
unfold_corr_kernel_size (int) – Kernel size used in unfolding context for computing correlation maps. Default: 3.
unfold_corr_stride (int) – Stride used in unfolding context for computing correlation maps. Default: 1.
unfold_corr_dilation (int) – Dilation used in unfolding context for computing correlation maps. Default: 1.
unfold_corr_padding (int) – Padding used in unfolding context for computing correlation maps. Default: 1.
scale (float) – The resale factor used in resize input features. Default: 0.5.
fuse_kernel_size (int) – The kernel size used in fusion module. Default: 3.
softmax_scale (float) – The scale factor for softmax function. Default: 10.
return_attention_score (bool) – If True, the attention score will be returned. Default: True.
- forward(x, context, mask=None)¶
Forward Function.
- 参数
x (torch.Tensor) – Tensor with shape (n, c, h, w).
context (torch.Tensor) – Tensor with shape (n, c, h, w).
mask (torch.Tensor) – Tensor with shape (n, 1, h, w). Default: None.
- 返回
Features after contextural attention.
- 返回类型
tuple(torch.Tensor)
- patch_correlation(x, kernel)¶
Calculate patch correlation.
- 参数
x (torch.Tensor) – Input tensor.
kernel (torch.Tensor) – Kernel tensor.
- 返回
Tensor with shape of (n, l, h, w).
- 返回类型
torch.Tensor
- patch_copy_deconv(attention_score, context_filter)¶
Copy patches using deconv.
- 参数
attention_score (torch.Tensor) – Tensor with shape of (n, l , h, w).
context_filter (torch.Tensor) – Filter kernel.
- 返回
Tensor with shape of (n, c, h, w).
- 返回类型
torch.Tensor
- fuse_correlation_map(correlation_map, h_unfold, w_unfold)¶
Fuse correlation map.
This operation is to fuse correlation map for increasing large consistent correlation regions.
The mechanism behind this op is simple and easy to understand. A standard ‘Eye’ matrix will be applied as a filter on the correlation map in horizontal and vertical direction.
The shape of input correlation map is (n, h_unfold*w_unfold, h, w). When adopting fusing, we will apply convolutional filter in the reshaped feature map with shape of (n, 1, h_unfold*w_fold, h*w).
A simple specification for horizontal direction is shown below:
(h, (h, (h, (h, 0) 1) 2) 3) ... (h, 0) (h, 1) 1 (h, 2) 1 (h, 3) 1 ...
- calculate_unfold_hw(input_size, kernel_size=3, stride=1, dilation=1, padding=0)¶
Calculate (h, w) after unfolding.
The official implementation of unfold in pytorch will put the dimension (h, w) into L. Thus, this function is just to calculate the (h, w) according to the equation in: https://pytorch.org/docs/stable/nn.html#torch.nn.Unfold
- calculate_overlap_factor(attention_score)¶
Calculate the overlap factor after applying deconv.
- 参数
attention_score (torch.Tensor) – The attention score with shape of (n, c, h, w).
- 返回
The overlap factor will be returned.
- 返回类型
torch.Tensor
- mask_correlation_map(correlation_map, mask)¶
Add mask weight for correlation map.
Add a negative infinity number to the masked regions so that softmax function will result in ‘zero’ in those regions.
- 参数
correlation_map (torch.Tensor) – Correlation map with shape of (n, h_unfold*w_unfold, h_map, w_map).
mask (torch.Tensor) – Mask tensor with shape of (n, c, h, w). ‘1’ in the mask indicates masked region while ‘0’ indicates valid region.
- 返回
Updated correlation map with mask.
- 返回类型
torch.Tensor
- im2col(img, kernel_size, stride=1, padding=0, dilation=1, normalize=False, return_cols=False)¶
Reshape image-style feature to columns.
This function is used for unfold feature maps to columns. The details of this function can be found in: https://pytorch.org/docs/1.1.0/nn.html?highlight=unfold#torch.nn.Unfold
- 参数
img (torch.Tensor) – Features to be unfolded. The shape of this feature should be (n, c, h, w).
kernel_size (int) – In this function, we only support square kernel with same height and width.
stride (int) – Stride number in unfolding. Default: 1.
padding (int) – Padding number in unfolding. Default: 0.
dilation (int) – Dilation number in unfolding. Default: 1.
normalize (bool) – If True, the unfolded feature will be normalized. Default: False.
return_cols (bool) – The official implementation in PyTorch of unfolding will return features with shape of (n, c*$prod{kernel_size}$, L). If True, the features will be reshaped to (n, L, c, kernel_size, kernel_size). Otherwise, the results will maintain the shape as the official implementation.
- 返回
Unfolded columns. If return_cols is True, the shape of output tensor is (n, L, c, kernel_size, kernel_size). Otherwise, the shape will be (n, c*$prod{kernel_size}$, L).
- 返回类型
torch.Tensor
- class mmagic.models.editors.ContextualAttentionNeck(in_channels, conv_type='conv', conv_cfg=None, norm_cfg=None, act_cfg=dict(type='ELU'), contextual_attention_args=dict(softmax_scale=10.0), **kwargs)¶
Bases:
mmengine.model.BaseModule
Neck with contextual attention module.
- 参数
in_channels (int) – The number of input channels.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
conv_cfg (dict | None) – Config of conv module. Default: None.
norm_cfg (dict | None) – Config of norm module. Default: None.
act_cfg (dict | None) – Config of activation layer. Default: dict(type=’ELU’).
contextual_attention_args (dict) – Config of contextual attention module. Default: dict(softmax_scale=10.).
kwargs (keyword arguments) –
- _conv_type¶
- forward(x, mask)¶
Forward Function.
- 参数
x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
mask (torch.Tensor) – Input tensor with shape of (n, 1, h, w).
- 返回
Output tensor with shape of (n, c, h’, w’).
- 返回类型
torch.Tensor
- class mmagic.models.editors.DeepFillDecoder(in_channels, conv_type='conv', norm_cfg=None, act_cfg=dict(type='ELU'), out_act_cfg=dict(type='clip', min=- 1.0, max=1.0), channel_factor=1.0, **kwargs)¶
Bases:
mmengine.model.BaseModule
Decoder used in DeepFill model.
This implementation follows: Generative Image Inpainting with Contextual Attention
- 参数
in_channels (int) – The number of input channels.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
norm_cfg (dict) – Config dict to build norm layer. Default: None.
act_cfg (dict) – Config dict for activation layer, “elu” by default.
out_act_cfg (dict) – Config dict for output activation layer. Here, we provide commonly used clamp or clip operation.
channel_factor (float) – The scale factor for channel size. Default: 1.
kwargs (keyword arguments) –
- _conv_type¶
- forward(input_dict)¶
Forward Function.
- 参数
input_dict (dict | torch.Tensor) – Input dict with middle features or torch.Tensor.
- 返回
Output tensor with shape of (n, c, h, w).
- 返回类型
torch.Tensor
- class mmagic.models.editors.DeepFillEncoder(in_channels=5, conv_type='conv', norm_cfg=None, act_cfg=dict(type='ELU'), encoder_type='stage1', channel_factor=1.0, **kwargs)¶
Bases:
mmengine.model.BaseModule
Encoder used in DeepFill model.
This implementation follows: Generative Image Inpainting with Contextual Attention
- 参数
in_channels (int) – The number of input channels. Default: 5.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
norm_cfg (dict) – Config dict to build norm layer. Default: None.
act_cfg (dict) – Config dict for activation layer, “elu” by default.
encoder_type (str) – Type of the encoder. Should be one of [‘stage1’, ‘stage2_conv’, ‘stage2_attention’]. Default: ‘stage1’.
channel_factor (float) – The scale factor for channel size. Default: 1.
kwargs (keyword arguments) –
- _conv_type¶
- forward(x)¶
Forward Function.
- 参数
x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
- 返回
Output tensor with shape of (n, c, h’, w’).
- 返回类型
torch.Tensor
- class mmagic.models.editors.DeepFillRefiner(encoder_attention=dict(type='DeepFillEncoder', encoder_type='stage2_attention'), encoder_conv=dict(type='DeepFillEncoder', encoder_type='stage2_conv'), dilation_neck=dict(type='GLDilationNeck', in_channels=128, act_cfg=dict(type='ELU')), contextual_attention=dict(type='ContextualAttentionNeck', in_channels=128), decoder=dict(type='DeepFillDecoder', in_channels=256))¶
Bases:
mmengine.model.BaseModule
Refiner used in DeepFill model.
This implementation follows: Generative Image Inpainting with Contextual Attention.
- 参数
encoder_attention (dict) – Config dict for encoder used in branch with contextual attention module.
encoder_conv (dict) – Config dict for encoder used in branch with just convolutional operation.
dilation_neck (dict) – Config dict for dilation neck in branch with just convolutional operation.
contextual_attention (dict) – Config dict for contextual attention neck.
decoder (dict) – Config dict for decoder used to fuse and decode features.
- forward(x, mask)¶
Forward Function.
- 参数
x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
mask (torch.Tensor) – Input tensor with shape of (n, 1, h, w).
- 返回
Output tensor with shape of (n, c, h’, w’).
- 返回类型
torch.Tensor
- class mmagic.models.editors.DeepFillv1Discriminators(global_disc_cfg, local_disc_cfg)¶
Bases:
mmengine.model.BaseModule
Discriminators used in DeepFillv1 model.
In DeepFillv1 model, the discriminators are independent without any concatenation like Global&Local model. Thus, we call this model DeepFillv1Discriminators. There exist a global discriminator and a local discriminator with global and local input respectively.
The details can be found in: Generative Image Inpainting with Contextual Attention.
- 参数
global_disc_cfg (dict) – Config dict for global discriminator.
local_disc_cfg (dict) – Config dict for local discriminator.
- forward(x)¶
Forward function.
- 参数
x (tuple[torch.Tensor]) – Contains global image and the local image patch.
- 返回
Contains the prediction from discriminators in global image and local image patch.
- 返回类型
tuple[torch.Tensor]
- init_weights()¶
Init weights for models.
- class mmagic.models.editors.DeepFillv1Inpaintor(data_preprocessor: dict, encdec: dict, disc=None, loss_gan=None, loss_gp=None, loss_disc_shift=None, loss_composed_percep=None, loss_out_percep=False, loss_l1_hole=None, loss_l1_valid=None, loss_tv=None, stage1_loss_type=None, stage2_loss_type=None, train_cfg=None, test_cfg=None, init_cfg: Optional[dict] = None)¶
Bases:
mmagic.models.base_models.TwoStageInpaintor
Inpaintor for deepfillv1 method.
This inpaintor is implemented according to the paper: Generative image inpainting with contextual attention
Importantly, this inpaintor is an example for using custom training schedule based on TwoStageInpaintor.
The training pipeline of deepfillv1 is as following:
if cur_iter < iter_tc: update generator with only l1 loss else: update discriminator if cur_iter > iter_td: update generator with l1 loss and adversarial loss
The new attribute cur_iter is added for recording current number of iteration. The train_cfg contains the setting of the training schedule:
train_cfg = dict( start_iter=0, disc_step=1, iter_tc=90000, iter_td=100000 )
iter_tc and iter_td correspond to the notation \(T_C\) and \(T_D\) of the original paper.
- 参数
generator (dict) – Config for encoder-decoder style generator.
disc (dict) – Config for discriminator.
loss_gan (dict) – Config for adversarial loss.
loss_gp (dict) – Config for gradient penalty loss.
loss_disc_shift (dict) – Config for discriminator shift loss.
loss_composed_percep (dict) – Config for perceptual and style loss with composed image as input.
loss_out_percep (dict) – Config for perceptual and style loss with direct output as input.
loss_l1_hole (dict) – Config for l1 loss in the hole.
loss_l1_valid (dict) – Config for l1 loss in the valid region.
loss_tv (dict) – Config for total variation loss.
train_cfg (dict) – Configs for training scheduler. disc_step must be contained for indicates the discriminator updating steps in each training step.
test_cfg (dict) – Configs for testing scheduler.
init_cfg (dict, optional) – Initialization config dict.
- forward_train_d(data_batch, is_real, is_disc)¶
Forward function in discriminator training step.
In this function, we modify the default implementation with only one discriminator. In DeepFillv1 model, they use two separated discriminators for global and local consistency.
- 参数
data_batch (torch.Tensor) – Batch of real data or fake data.
is_real (bool) – If True, the gan loss will regard this batch as real data. Otherwise, the gan loss will regard this batch as fake data.
is_disc (bool) – If True, this function is called in discriminator training step. Otherwise, this function is called in generator training step. This will help us to compute different types of adversarial loss, like LSGAN.
- 返回
Contains the loss items computed in this function.
- 返回类型
dict
- two_stage_loss(stage1_data, stage2_data, gt, mask, masked_img)¶
Calculate two-stage loss.
- 参数
stage1_data (dict) – Contain stage1 results.
stage2_data (dict) – Contain stage2 results.
gt (torch.Tensor) – Ground-truth image.
mask (torch.Tensor) – Mask image.
masked_img (torch.Tensor) – Composition of mask image and ground-truth image.
- 返回
Dict contains the results computed within this function for visualization and dict contains the loss items computed in this function.
- 返回类型
tuple(dict)
- calculate_loss_with_type(loss_type, fake_res, fake_img, gt, mask, prefix='stage1_', fake_local=None)¶
Calculate multiple types of losses.
- 参数
loss_type (str) – Type of the loss.
fake_res (torch.Tensor) – Direct results from model.
fake_img (torch.Tensor) – Composited results from model.
gt (torch.Tensor) – Ground-truth tensor.
mask (torch.Tensor) – Mask tensor.
prefix (str, optional) – Prefix for loss name. Defaults to ‘stage1_’. # noqa
fake_local (torch.Tensor, optional) – Local results from model. Defaults to None.
- 返回
Contain loss value with its name.
- 返回类型
dict
- train_step(data: List[dict], optim_wrapper)¶
Train step function.
In this function, the inpaintor will finish the train step following the pipeline:
get fake res/image
optimize discriminator (if have)
optimize generator
If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing generator after disc_step iterations for discriminator.
- 参数
data (List[dict]) – Batch of data as input.
optim_wrapper (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).
- 返回
Dict with loss, information for logger, the number of samples and results for visualization.
- 返回类型
dict
- class mmagic.models.editors.DeepFillEncoderDecoder(stage1=dict(type='GLEncoderDecoder', encoder=dict(type='DeepFillEncoder'), decoder=dict(type='DeepFillDecoder', in_channels=128), dilation_neck=dict(type='GLDilationNeck', in_channels=128, act_cfg=dict(type='ELU'))), stage2=dict(type='DeepFillRefiner'), return_offset=False)¶
Bases:
mmengine.model.BaseModule
Two-stage encoder-decoder structure used in DeepFill model.
The details are in: Generative Image Inpainting with Contextual Attention
- 参数
stage1 (dict) – Config dict for building stage1 model. As DeepFill model uses Global&Local model as baseline in first stage, the stage1 model can be easily built with GLEncoderDecoder.
stage2 (dict) – Config dict for building stage2 model.
return_offset (bool) – Whether to return offset feature in contextual attention module. Default: False.
- forward(x)¶
Forward function.
- 参数
x (torch.Tensor) – This input tensor has the shape of (n, 5, h, w). In channel dimension, we concatenate [masked_img, ones, mask] as DeepFillv1 models do.
- 返回
The first two item is the results from first and second stage. If set return_offset as True, the offset will be returned as the third item.
- 返回类型
tuple[torch.Tensor]
- init_weights()¶
Init weights for models.
- class mmagic.models.editors.DIC(generator, pixel_loss, align_loss, discriminator=None, gan_loss=None, feature_loss=None, train_cfg=None, test_cfg=None, init_cfg=None, data_preprocessor=None)¶
Bases:
mmagic.models.editors.srgan.SRGAN
DIC model for Face Super-Resolution.
- Paper: Deep Face Super-Resolution with Iterative Collaboration between
Attentive Recovery and Landmark Estimation.
- 参数
generator (dict) – Config for the generator.
pixel_loss (dict) – Config for the pixel loss.
align_loss (dict) – Config for the align loss.
discriminator (dict) – Config for the discriminator. Default: None.
gan_loss (dict) – Config for the gan loss. Default: None.
feature_loss (dict) – Config for the feature loss. Default: None.
train_cfg (dict) – Config for train. Default: None.
test_cfg (dict) – Config for testing. Default: None.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
. Default: None.data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
. Default: None.
- forward_tensor(inputs, data_samples=None, training=False)¶
Forward tensor. Returns result of simple forward.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.training (bool) – Whether is training. Default: False.
- 返回
- results of forward inference and
forward train.
- 返回类型
(Tensor | Tuple[List[Tensor]])
- if_run_g()¶
Calculates whether need to run the generator step.
- if_run_d()¶
Calculates whether need to run the discriminator step.
- g_step(batch_outputs, batch_gt_data)¶
G step of GAN: Calculate losses of generator.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
- 返回
Dict of losses.
- 返回类型
dict
- train_step(data: List[dict], optim_wrapper: mmengine.optim.OptimWrapperDict) Dict[str, torch.Tensor] ¶
Train step of GAN-based method.
- 参数
data (List[dict]) – Data sampled from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, torch.Tensor]
- static extract_gt_data(data_samples)¶
extract gt data from data samples.
- 参数
data_samples (list) – List of DataSample.
- 返回
Extract gt data.
- 返回类型
Tensor
- class mmagic.models.editors.DICNet(in_channels, out_channels, mid_channels, num_blocks=6, hg_mid_channels=256, hg_num_keypoints=68, num_steps=4, upscale_factor=8, detach_attention=False, prelu_init=0.2, num_heatmaps=5, num_fusion_blocks=7, init_cfg=None)¶
Bases:
mmengine.model.BaseModule
DIC network structure for face super-resolution.
- Paper: Deep Face Super-Resolution with Iterative Collaboration between
Attentive Recovery and Landmark Estimation
- 参数
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels in the output image
mid_channels (int) – Channel number of intermediate features. Default: 64
num_blocks (tuple[int]) – Block numbers in the trunk network. Default: 6
hg_mid_channels (int) – Channel number of intermediate features of HourGlass. Default: 256
hg_num_keypoints (int) – Keypoint number of HourGlass. Default: 68
num_steps (int) – Number of iterative steps. Default: 4
upscale_factor (int) – Upsampling factor. Default: 8
detach_attention (bool) – Detached from the current tensor for heatmap or not.
prelu_init (float) – init of PReLU. Default: 0.2
num_heatmaps (int) – Number of heatmaps. Default: 5
num_fusion_blocks (int) – Number of fusion blocks. Default: 7
init_cfg (dict, optional) – Initialization config dict. Default: None.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor.
- 返回
Forward results. sr_outputs (list[Tensor]): forward sr results. heatmap_outputs (list[Tensor]): forward heatmap results.
- 返回类型
Tensor
- class mmagic.models.editors.FeedbackBlock(mid_channels, num_blocks, upscale_factor, padding=2, prelu_init=0.2)¶
Bases:
torch.nn.Module
Feedback Block of DIC.
It has a style of:
----- Module -----> ^ | |____________|
- 参数
mid_channels (int) – Number of channels in the intermediate features.
num_blocks (int) – Number of blocks.
upscale_factor (int) – upscale factor.
padding (int) – Padding size. Default: 2.
prelu_init (float) – init of PReLU. Default: 0.2
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (n, c, h, w).
- 返回
Forward results.
- 返回类型
Tensor
- class mmagic.models.editors.FeedbackBlockCustom(in_channels, mid_channels, num_blocks, upscale_factor)¶
Bases:
FeedbackBlock
Custom feedback block, will be used as the first feedback block.
- 参数
in_channels (int) – Number of channels in the input features.
mid_channels (int) – Number of channels in the intermediate features.
num_blocks (int) – Number of blocks.
upscale_factor (int) – upscale factor.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (n, c, h, w).
- 返回
Forward results.
- 返回类型
Tensor
- class mmagic.models.editors.FeedbackBlockHeatmapAttention(mid_channels, num_blocks, upscale_factor, num_heatmaps, num_fusion_blocks, padding=2, prelu_init=0.2)¶
Bases:
FeedbackBlock
Feedback block with HeatmapAttention.
- 参数
in_channels (int) – Number of channels in the input features.
mid_channels (int) – Number of channels in the intermediate features.
num_blocks (int) – Number of blocks.
upscale_factor (int) – upscale factor.
padding (int) – Padding size. Default: 2.
prelu_init (float) – init of PReLU. Default: 0.2
- forward(x, heatmap)¶
Forward function.
- 参数
x (Tensor) – Input feature tensor.
heatmap (Tensor) – Input heatmap tensor.
- 返回
Forward results.
- 返回类型
Tensor
- class mmagic.models.editors.LightCNN(in_channels)¶
Bases:
mmengine.model.BaseModule
LightCNN discriminator with input size 128 x 128.
It is used to train DICGAN.
- 参数
in_channels (int) – Channel number of inputs.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor.
- 返回
Forward results.
- 返回类型
Tensor
- init_weights(pretrained=None, strict=True)¶
Init weights for models.
- 参数
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.
- class mmagic.models.editors.MaxFeature(in_channels, out_channels, kernel_size=3, stride=1, padding=1, filter_type='conv2d')¶
Bases:
torch.nn.Module
Conv2d or Linear layer with max feature selector.
- Generate feature maps with double channels, split them and select the max
feature.
- 参数
in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
kernel_size (int or tuple) – Size of the convolving kernel.
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 1
filter_type (str) – Type of filter. Options are ‘conv2d’ and ‘linear’. Default: ‘conv2d’.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor.
- 返回
Forward results.
- 返回类型
Tensor
- class mmagic.models.editors.DIM(data_preprocessor, backbone, refiner=None, train_cfg=None, test_cfg=None, loss_alpha=None, loss_comp=None, loss_refine=None, init_cfg: Optional[dict] = None)¶
Bases:
mmagic.models.base_models.BaseMattor
Deep Image Matting model.
https://arxiv.org/abs/1703.03872
备注
For
(self.train_cfg.train_backbone, self.train_cfg.train_refiner)
:(True, False)
corresponds to the encoder-decoder stage in the paper.(False, True)
corresponds to the refinement stage in the paper.(True, True)
corresponds to the fine-tune stage in the paper.
- 参数
data_preprocessor (dict, optional) – Config of data pre-processor.
backbone (dict) – Config of backbone.
refiner (dict) – Config of refiner.
loss_alpha (dict) – Config of the alpha prediction loss. Default: None.
loss_comp (dict) – Config of the composition loss. Default: None.
loss_refine (dict) – Config of the loss of the refiner. Default: None.
train_cfg (dict) – Config of training. In
train_cfg
,train_backbone
should be specified. If the model has a refiner,train_refiner
should be specified.test_cfg (dict) – Config of testing. In
test_cfg
, If the model has a refiner,train_refiner
should be specified.init_cfg (dict, optional) – The weight initialized config for
BaseModule
. Default: None.
- property with_refiner¶
Whether the matting model has a refiner.
- init_weights()¶
Initialize the model network weights.
- train(mode=True)¶
Mode switcher.
- 参数
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.
- freeze_backbone()¶
Freeze the backbone and only train the refiner.
- _forward(x: torch.Tensor, *, refine: bool = True) Tuple[torch.Tensor, torch.Tensor] ¶
Raw forward function.
- 参数
x (torch.Tensor) – Concatenation of merged image and trimap with shape (N, 4, H, W)
refine (bool) – if forward through refiner
- 返回
pred_alpha, with shape (N, 1, H, W) torch.Tensor: pred_refine, with shape (N, 4, H, W)
- 返回类型
torch.Tensor
- _forward_test(inputs)¶
Forward to get alpha prediction.
- _forward_train(inputs, data_samples)¶
Defines the computation performed at every training call.
- 参数
inputs (torch.Tensor) – Concatenation of normalized image and trimap shape (N, 4, H, W)
data_samples (list[DataSample]) –
Data samples containing: - gt_alpha (Tensor): Ground-truth of alpha
shape (N, 1, H, W), normalized to 0 to 1.
- gt_fg (Tensor): Ground-truth of foreground
shape (N, C, H, W), normalized to 0 to 1.
- gt_bg (Tensor): Ground-truth of background
shape (N, C, H, W), normalized to 0 to 1.
- 返回
Contains the loss items and batch information.
- 返回类型
dict
- class mmagic.models.editors.ClipWrapper(clip_type, *args, **kwargs)¶
Bases:
torch.nn.Module
Clip Models wrapper.
We provide wrappers for the clip models of
openai
andmlfoundations
, where the user can specifyclip_type
asclip
oropen_clip
, and then initialize a clip model using the same arguments as in the original codebase. The following clip models settings are provided in the official repo of disco diffusion: | Setting | Source | Arguments | # noqa |:-----------------------------:|———–|--------------------------------------------------------------| # noqa | ViTB32 | clip | name=’ViT-B/32’, jit=False | # noqa | ViTB16 | clip | name=’ViT-B/16’, jit=False | # noqa | ViTL14 | clip | name=’ViT-L/14’, jit=False | # noqa | ViTL14_336px | clip | name=’ViT-L/14@336px’, jit=False | # noqa | RN50 | clip | name=’RN50’, jit=False | # noqa | RN50x4 | clip | name=’RN50x4’, jit=False | # noqa | RN50x16 | clip | name=’RN50x16’, jit=False | # noqa | RN50x64 | clip | name=’RN50x64’, jit=False | # noqa | RN101 | clip | name=’RN101’, jit=False | # noqa | ViTB32_laion2b_e16 | open_clip | name=’ViT-B-32’, pretrained=’laion2b_e16’ | # noqa | ViTB32_laion400m_e31 | open_clip | model_name=’ViT-B-32’, pretrained=’laion400m_e31’ | # noqa | ViTB32_laion400m_32 | open_clip | model_name=’ViT-B-32’, pretrained=’laion400m_e32’ | # noqa | ViTB32quickgelu_laion400m_e31 | open_clip | model_name=’ViT-B-32-quickgelu’, pretrained=’laion400m_e31’ | # noqa | ViTB32quickgelu_laion400m_e32 | open_clip | model_name=’ViT-B-32-quickgelu’, pretrained=’laion400m_e32’ | # noqa | ViTB16_laion400m_e31 | open_clip | model_name=’ViT-B-16’, pretrained=’laion400m_e31’ | # noqa | ViTB16_laion400m_e32 | open_clip | model_name=’ViT-B-16’, pretrained=’laion400m_e32’ | # noqa | RN50_yffcc15m | open_clip | model_name=’RN50’, pretrained=’yfcc15m’ | # noqa | RN50_cc12m | open_clip | model_name=’RN50’, pretrained=’cc12m’ | # noqa | RN50_quickgelu_yfcc15m | open_clip | model_name=’RN50-quickgelu’, pretrained=’yfcc15m’ | # noqa | RN50_quickgelu_cc12m | open_clip | model_name=’RN50-quickgelu’, pretrained=’cc12m’ | # noqa | RN101_yfcc15m | open_clip | model_name=’RN101’, pretrained=’yfcc15m’ | # noqa | RN101_quickgelu_yfcc15m | open_clip | model_name=’RN101-quickgelu’, pretrained=’yfcc15m’ | # noqaAn example of a
clip_modes_cfg
is as follows:Examples:
>>> # Use OpenAI's CLIP >>> config = dict( >>> type='ClipWrapper', >>> clip_type='clip', >>> name='ViT-B/32', >>> jit=False)
>>> # Use OpenCLIP >>> config = dict( >>> type='ClipWrapper', >>> clip_type='open_clip', >>> model_name='RN50', >>> pretrained='yfcc15m')
>>> # Use CLIP from Hugging Face Transformers >>> config = dict( >>> type='ClipWrapper', >>> clip_type='huggingface', >>> pretrained_model_name_or_path='runwayml/stable-diffusion-v1-5', >>> subfolder='text_encoder')
- 参数
clip_type (List[Dict]) – The original source of the clip model. Whether be
clip
,open_clip
orhugging_face
.*args – Arguments to initialize corresponding clip model.
**kwargs –
Arguments to initialize corresponding clip model.
- get_embedding_layer()¶
Function to get embedding layer of the clip model.
Only support for CLIPTextModel currently.
- add_embedding(embeddings: Union[dict, List[dict]])¶
- set_only_embedding_trainable()¶
- set_embedding_layer()¶
- unset_embedding_layer()¶
- forward(*args, **kwargs)¶
Forward function.
- class mmagic.models.editors.DiscoDiffusion(unet, diffusion_scheduler, secondary_model=None, clip_models=[], use_fp16=False, pretrained_cfgs=None)¶
Bases:
torch.nn.Module
Disco Diffusion (DD) is a Google Colab Notebook which leverages an AI Image generating technique called CLIP-Guided Diffusion to allow you to create compelling and beautiful images from just text inputs. Created by Somnai, augmented by Gandamu, and building on the work of RiversHaveWings, nshepperd, and many others.
- Ref:
Github Repo: https://github.com/alembics/disco-diffusion Colab: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb # noqa
- 参数
unet (ModelType) – Config of denoising Unet.
diffusion_scheduler (ModelType) – Config of diffusion_scheduler scheduler.
secondary_model (ModelType) – A smaller secondary diffusion model trained by Katherine Crowson to remove noise from intermediate timesteps to prepare them for CLIP. Ref: https://twitter.com/rivershavewings/status/1462859669454536711 # noqa Defaults to None.
clip_models (list) – Config of clip models. Defaults to [].
use_fp16 (bool) – Whether to use fp16 for unet model. Defaults to False.
pretrained_cfgs (dict) – Path Config for pretrained weights. Usually this is a dict contains module name and the corresponding ckpt path. Defaults to None.
- property device¶
Get current device of the model.
- 返回
The current device of the model.
- 返回类型
torch.device
- load_pretrained_models(pretrained_cfgs)¶
Loading pretrained weights to model.
pretrained_cfgs
is a dict consist of module name as key and checkpoint path as value.- 参数
pretrained_cfgs (dict) – Path Config for pretrained weights.
the (Usually this is a dict contains module name and) –
None. (corresponding ckpt path. Defaults to) –
- infer(scheduler_kwargs=None, height=None, width=None, init_image=None, batch_size=1, num_inference_steps=100, skip_steps=0, show_progress=True, text_prompts=[], image_prompts=[], eta=0.8, clip_guidance_scale=5000, init_scale=1000, tv_scale=0.0, sat_scale=0.0, range_scale=150, cut_overview=[12] * 400 + [4] * 600, cut_innercut=[4] * 400 + [12] * 600, cut_ic_pow=[1] * 1000, cut_icgray_p=[0.2] * 400 + [0] * 600, cutn_batches=4, seed=None)¶
Inference API for disco diffusion.
- 参数
scheduler_kwargs (dict) – Args for infer time diffusion scheduler. Defaults to None.
height (int) – Height of output image. Defaults to None.
width (int) – Width of output image. Defaults to None.
init_image (str) – Initial image at the start point of denoising. Defaults to None.
batch_size (int) – Batch size. Defaults to 1.
num_inference_steps (int) – Number of inference steps. Defaults to 1000.
skip_steps (int) – Denoising steps to skip, usually set with
init_image
. Defaults to 0.show_progress (bool) – Whether to show progress. Defaults to False.
text_prompts (list) – Text prompts. Defaults to [].
image_prompts (list) – Image prompts, this is not the same as
init_image
, they works the same way withtext_prompts
. Defaults to [].eta (float) – Eta for ddim sampling. Defaults to 0.8.
clip_guidance_scale (int) – The Scale of influence of prompts on output image. Defaults to 1000.
seed (int) – Sampling seed. Defaults to None.
- class mmagic.models.editors.DreamBooth(vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, lora_config: Optional[dict] = None, val_prompts: Union[str, List[str]] = None, class_prior_prompt: Optional[str] = None, num_class_images: Optional[int] = 3, prior_loss_weight: float = 0, finetune_text_encoder: bool = False, dtype: str = 'fp16', enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor: Optional[ModelType] = dict(type='DataPreprocessor'), init_cfg: Optional[dict] = None)¶
Bases:
mmagic.models.editors.stable_diffusion.stable_diffusion.StableDiffusion
Implementation of `DreamBooth with Stable Diffusion.
<https://arxiv.org/abs/2208.12242>`_ (DreamBooth).
- 参数
vae (Union[dict, nn.Module]) – The config or module for VAE model.
text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.
tokenizer (str) – The name for CLIP tokenizer.
unet (Union[dict, nn.Module]) – The config or module for Unet model.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
lora_config (dict, optional) – The config for LoRA finetuning. Defaults to None.
val_prompts (Union[str, List[str]], optional) – The prompts for validation. Defaults to None.
class_prior_prompt (str, optional) – The prompt for class prior loss.
num_class_images (int, optional) – The number of images for class prior. Defaults to 3.
prior_loss_weight (float, optional) – The weight for class prior loss. Defaults to 0.
finetune_text_encoder (bool, optional) – Whether to fine-tune text encoder. Defaults to False.
dtype (str, optional) – The dtype for the model. Defaults to ‘fp16’.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise # noqa Defaults to 0.
tomesd_cfg (dict, optional) – The config for TOMESD. Please refers to https://github.com/dbolya/tomesd and https://github.com/open-mmlab/mmagic/blob/main/mmagic/models/utils/tome_utils.py for detail. # noqa Defaults to None.
data_preprocessor (dict, optional) –
The pre-process config of
BaseDataPreprocessor
. Defaults todict(type=’DataPreprocessor’).
init_cfg (dict, optional) – The weight initialized config for
BaseModule
. Defaults to None/
- generate_class_prior_images(num_batches=None)¶
Generate images for class prior loss.
- 参数
num_batches (int) – Number of batches to generate images. If not passed, all images will be generated in one forward. Defaults to None.
- prepare_model()¶
Prepare model for training.
Move model to target dtype and disable gradient for some models.
- set_lora()¶
Set LORA for model.
- val_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the generated image of given data. Calls
self.data_preprocessor
andself.infer
in order. Return the generated results which will be passed to evaluator or visualizer.- 参数
data (dict or tuple or list) – Data sampled from dataset.
- 返回
Generated image or image dict.
- 返回类型
SampleList
- test_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the generated image of given data. Calls
self.data_preprocessor
andself.infer
in order. Return the generated results which will be passed to evaluator or visualizer.- 参数
data (dict or tuple or list) – Data sampled from dataset.
- 返回
Generated image or image dict.
- 返回类型
SampleList
- train_step(data, optim_wrapper)¶
Implements the default model training process including preprocessing, model forward propagation, loss calculation, optimization, and back-propagation.
During non-distributed training. If subclasses do not override the
train_step()
,EpochBasedTrainLoop
orIterBasedTrainLoop
will call this method to update model parameters. The default parameter update process is as follows:Calls
self.data_processor(data, training=False)
to collect batch_inputs and corresponding data_samples(labels).Calls
self(batch_inputs, data_samples, mode='loss')
to get raw lossCalls
self.parse_losses
to getparsed_losses
tensor used to backward and dict of loss tensor used to log messages.Calls
optim_wrapper.update_params(loss)
to update model.
- 参数
data (dict or tuple or list) – Data sampled from dataset.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, torch.Tensor]
- abstract forward(inputs: torch.Tensor, data_samples: Optional[list] = None, mode: str = 'tensor') Union[Dict[str, torch.Tensor], list] ¶
forward is not implemented now.
- class mmagic.models.editors.EDSRNet(in_channels, out_channels, mid_channels=64, num_blocks=16, upscale_factor=4, res_scale=1, rgb_mean=[0.4488, 0.4371, 0.404], rgb_std=[1.0, 1.0, 1.0])¶
Bases:
mmengine.model.BaseModule
EDSR network structure.
Paper: Enhanced Deep Residual Networks for Single Image Super-Resolution. Ref repo: https://github.com/thstkdgus35/EDSR-PyTorch
- 参数
in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64.
num_blocks (int) – Block number in the trunk network. Default: 16.
upscale_factor (int) – Upsampling factor. Support 2^n and 3. Default: 4.
res_scale (float) – Used to scale the residual in residual block. Default: 1.
rgb_mean (list[float]) – Image mean in RGB orders. Default: [0.4488, 0.4371, 0.4040], calculated from DIV2K dataset.
rgb_std (list[float]) – Image std in RGB orders. In EDSR, it uses [1.0, 1.0, 1.0]. Default: [1.0, 1.0, 1.0].
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (n, c, h, w).
- 返回
Forward results.
- 返回类型
Tensor
- class mmagic.models.editors.EDVR(generator, pixel_loss, train_cfg=None, test_cfg=None, init_cfg=None, data_preprocessor=None)¶
Bases:
mmagic.models.BaseEditModel
EDVR model for video super-resolution.
EDVR: Video Restoration with Enhanced Deformable Convolutional Networks.
- 参数
generator (dict) – Config for the generator structure.
pixel_loss (dict) – Config for pixel-wise loss.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
.data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.
- forward_train(inputs, data_samples=None)¶
Forward training. Returns dict of losses of training.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.
- 返回
Dict of losses.
- 返回类型
dict
- class mmagic.models.editors.EDVRNet(in_channels, out_channels, mid_channels=64, num_frames=5, deform_groups=8, num_blocks_extraction=5, num_blocks_reconstruction=10, center_frame_idx=2, with_tsa=True, init_cfg=None)¶
Bases:
mmengine.model.BaseModule
EDVR network structure for video super-resolution.
Now only support X4 upsampling factor. Paper: EDVR: Video Restoration with Enhanced Deformable Convolutional Networks.
- 参数
in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64.
num_frames (int) – Number of input frames. Default: 5.
deform_groups (int) – Deformable groups. Defaults: 8.
num_blocks_extraction (int) – Number of blocks for feature extraction. Default: 5.
num_blocks_reconstruction (int) – Number of blocks for reconstruction. Default: 10.
center_frame_idx (int) – The index of center frame. Frame counting from 0. Default: 2.
with_tsa (bool) – Whether to use TSA module. Default: True.
init_cfg (dict, optional) – Initialization config dict. Default: None.
- forward(x)¶
Forward function for EDVRNet.
- 参数
x (Tensor) – Input tensor with shape (n, t, c, h, w).
- 返回
SR center frame with shape (n, c, h, w).
- 返回类型
Tensor
- init_weights()¶
Init weights for models.
- class mmagic.models.editors.EG3D(generator: ModelType, discriminator: Optional[ModelType] = None, camera: Optional[ModelType] = None, data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, noise_size: Optional[int] = None, ema_config: Optional[Dict] = None, loss_config: Optional[Dict] = None)¶
Bases:
mmagic.models.base_models.BaseConditionalGAN
Implementation of Efficient Geometry-aware 3D Generative Adversarial Networks
<https://openaccess.thecvf.com/content/CVPR2022/papers/Chan_Efficient_Geometry-Aware_3D_Generative_Adversarial_Networks_CVPR_2022_paper.pdf>_ (EG3D). # noqa
Detailed architecture can be found in
TriplaneGenerator
andDualDiscriminator
- 参数
generator (ModelType) – The config or model of the generator.
discriminator (Optional[ModelType]) – The config or model of the discriminator. Defaults to None.
camera (Optional[ModelType]) – The pre-defined camera to sample random camera position. If you want to generate images or videos via high-level API, you must set this argument. Defaults to None.
data_preprocessor (Optional[Union[dict, Config]]) – The pre-process config or
DataPreprocessor
.generator_steps (int) – Number of times the generator was completely updated before the discriminator is updated. Defaults to 1.
discriminator_steps (int) – Number of times the discriminator was completely updated before the generator is updated. Defaults to 1.
noise_size (Optional[int]) – Size of the input noise vector. Default to 128.
num_classes (Optional[int]) – The number classes you would like to generate. Defaults to None.
ema_config (Optional[Dict]) – The config for generator’s exponential moving average setting. Defaults to None.
loss_config (Optional[Dict]) – The config for training losses. Defaults to None.
- label_fn(label: Optional[torch.Tensor] = None, num_batches: int = 1) torch.Tensor ¶
Label sampling function for EG3D model.
- 参数
label (Optional[Tensor]) – Conditional for EG3D model. If not passed,
self.camera
will be used to sample random camera-to-world and intrinsics matrix. Defaults to None.- 返回
Conditional input for EG3D model.
- 返回类型
torch.Tensor
- data_sample_to_label(data_sample: mmagic.utils.typing.SampleList) Optional[torch.Tensor] ¶
Get labels from input data_sample and pack to torch.Tensor. If no label is found in the passed data_sample, None would be returned.
- 参数
data_sample (List[DataSample]) – Input data samples.
- 返回
Packed label tensor.
- 返回类型
Optional[torch.Tensor]
- pack_to_data_sample(output: Dict[str, torch.Tensor], data_sample: Optional[mmagic.structures.DataSample] = None) mmagic.structures.DataSample ¶
Pack output to data sample. If
data_sample
is not passed, a new DataSample will be instantiated. Otherwise, outputs will be added to the passed datasample.- 参数
output (Dict[Tensor]) – Output of the model.
index (int) – The index to save.
data_sample (DataSample, optional) – Data sample to save outputs. Defaults to None.
- 返回
Data sample with packed outputs.
- 返回类型
- forward(inputs: mmagic.utils.typing.ForwardInputs, data_samples: Optional[list] = None, mode: Optional[str] = None) List[mmagic.structures.DataSample] ¶
Sample images with the given inputs. If forward mode is ‘ema’ or ‘orig’, the image generated by corresponding generator will be returned. If forward mode is ‘ema/orig’, images generated by original generator and EMA generator will both be returned in a dict.
- 参数
inputs (ForwardInputs) – Dict containing the necessary information (e.g. noise, num_batches, mode) to generate image.
data_samples (Optional[list]) – Data samples collated by
data_preprocessor
. Defaults to None.mode (Optional[str]) – mode is not used in
BaseConditionalGAN
. Defaults to None.
- 返回
Generated images or image dict.
- 返回类型
List[DataSample]
- interpolation(num_images: int, num_batches: int = 4, mode: str = 'both', sample_model: str = 'orig', show_pbar: bool = True) List[dict] ¶
Interpolation input and return a list of output results. We support three kinds of interpolation mode:
- ‘camera’: First generate style code with random noise and forward
camera. Then synthesis images with interpolated camera position and fixed style code.
- ‘conditioning’: First generate style code with fixed noise and
interpolated camera. Then synthesis images with style codes and forward camera.
‘both’: Generate images with interpolated camera position.
- 参数
num_images (int) – The number of images want to generate.
num_batches (int, optional) – The number of batches to generate at one time. Defaults to 4.
mode (str, optional) – The interpolation mode. Supported choices are ‘both’, ‘camera’, and ‘conditioning’. Defaults to ‘both’.
sample_model (str, optional) – The model used to generate images, support ‘orig’ and ‘ema’. Defaults to ‘orig’.
show_pbar (bool, optional) – Whether display a progress bar during interpolation. Defaults to True.
- 返回
The list of output dict of each frame.
- 返回类型
List[dict]
- class mmagic.models.editors.ESRGAN(generator, discriminator=None, gan_loss=None, pixel_loss=None, perceptual_loss=None, train_cfg=None, test_cfg=None, init_cfg=None, data_preprocessor=None)¶
Bases:
mmagic.models.editors.srgan.SRGAN
Enhanced SRGAN model for single image super-resolution.
Ref: ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. It uses RaGAN for GAN updates: The relativistic discriminator: a key element missing from standard GAN.
- 参数
generator (dict) – Config for the generator.
discriminator (dict) – Config for the discriminator. Default: None.
gan_loss (dict) – Config for the gan loss. Note that the loss weight in gan loss is only for the generator.
pixel_loss (dict) – Config for the pixel loss. Default: None.
perceptual_loss (dict) – Config for the perceptual loss. Default: None.
train_cfg (dict) – Config for training. Default: None. You may change the training of gan by setting: disc_steps: how many discriminator updates after one generate update; disc_init_steps: how many discriminator updates at the start of the training. These two keys are useful when training with WGAN.
test_cfg (dict) – Config for testing. Default: None.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
. Default: None.
- g_step(batch_outputs: torch.Tensor, batch_gt_data: torch.Tensor)¶
G step of GAN: Calculate losses of generator.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
- 返回
Dict of losses.
- 返回类型
dict
- d_step_real(batch_outputs: torch.Tensor, batch_gt_data: torch.Tensor)¶
D step of real data.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
- 返回
Dict of losses.
- 返回类型
dict
- d_step_fake(batch_outputs: torch.Tensor, batch_gt_data)¶
D step of fake data.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
- 返回
Dict of losses.
- 返回类型
dict
- class mmagic.models.editors.RRDBNet(in_channels, out_channels, mid_channels=64, num_blocks=23, growth_channels=32, upscale_factor=4, init_cfg=None)¶
Bases:
mmengine.model.BaseModule
Networks consisting of Residual in Residual Dense Block, which is used in ESRGAN and Real-ESRGAN.
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data. # noqa: E501 Currently, it supports [x1/x2/x4] upsampling scale factor.
- 参数
in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64
num_blocks (int) – Block number in the trunk network. Defaults: 23
growth_channels (int) – Channels for each growth. Default: 32.
upscale_factor (int) – Upsampling factor. Support x1, x2 and x4. Default: 4.
init_cfg (dict, optional) – Initialization config dict. Default: None.
- _supported_upscale_factors = [1, 2, 4]¶
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (n, c, h, w).
- 返回
Forward results.
- 返回类型
Tensor
- init_weights()¶
Init weights for models.
- class mmagic.models.editors.FastComposer(pretrained_cfg: dict, vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, dtype: str = 'fp32', enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor=dict(type='DataPreprocessor'), init_cfg: Optional[dict] = None)¶
Bases:
mmagic.models.editors.stable_diffusion.StableDiffusion
Class for Stable Diffusion. Refers to https://github.com/Stability- AI/stablediffusion and https://github.com/huggingface/diffusers/blob/main/s rc/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_an d_excite.py # noqa.
- 参数
unet (Union[dict, nn.Module]) – The config or module for Unet model.
text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.
vae (Union[dict, nn.Module]) – The config or module for VAE model.
tokenizer (str) – The name for CLIP tokenizer.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
dtype (str, optional) – The dtype for the model This argument will not work when dtype is defined for submodels. Defaults to None.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise Defaults to 0.
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.init_cfg (dict, optional) – The weight initialized config for
BaseModule
.
- _tokenize_and_mask_noun_phrases_ends(caption)¶
Augment the text embedding.
- _encode_augmented_prompt(prompt: str, reference_images: List[PIL.Image.Image], device: torch.device, weight_dtype: torch.dtype)¶
Encode reference images.
- 参数
prompt (str or list(int)) – prompt to be encoded.
reference_images – (List[Image.Image]): List of reference images.
device (torch.device) – torch device.
weight_dtype (torch.dtype) – torch.dtype.
- 返回
- text embeddings generated by
clip text encoder.
- 返回类型
text_embeddings (torch.Tensor)
- infer(prompt: Union[str, List[str]] = None, height: Optional[int] = None, width: Optional[int] = None, num_inference_steps: int = 50, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None, latents: Optional[torch.FloatTensor] = None, prompt_embeds: Optional[torch.FloatTensor] = None, negative_prompt_embeds: Optional[torch.FloatTensor] = None, output_type: Optional[str] = 'pil', return_dict: bool = True, callback: Optional[Callable[[int, int, torch.FloatTensor], None]] = None, callback_steps: int = 1, cross_attention_kwargs: Optional[Dict[str, Any]] = None, alpha_: float = 0.7, reference_subject_images: List[PIL.Image.Image] = None, augmented_prompt_embeds: Optional[torch.FloatTensor] = None, show_progress: bool = True)¶
Function invoked when calling the pipeline for generation.
- 参数
prompt (str or List[str], optional) – The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds. instead.
height (int, optional) – defaults to self.unet.config.sample_size * self.vae_scale_factor The height in pixels of the generated image.
width (int, optional) – defaults to self.unet.config.sample_size * self.vae_scale_factor The width in pixels of the generated image.
num_inference_steps (int, optional, defaults to 50) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float, optional, defaults to 7.5) – Guidance scale as defined in [Classifier-Free Diffusion Guidance] (https://arxiv.org/abs/2207.12598). guidance_scale is defined as w of equation 2. of [Imagen Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting guidance_scale > 1. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.
negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
num_images_per_prompt (int, optional, defaults to 1) – The number of images to generate per prompt.
eta (float, optional, defaults to 0.0) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to [schedulers.DDIMScheduler], will be ignored for others.
List[torch.Generator] (generator (torch.Generator or) –
- optional):
One or a list of [torch generator(s)](https://pytorch.org/ docs/stable/generated/torch.Generator.html) to make generation deterministic.
- :param :
- optional):
One or a list of [torch generator(s)](https://pytorch.org/ docs/stable/generated/torch.Generator.html) to make generation deterministic.
- 参数
latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will ge generated by sampling using the supplied random generator.
prompt_embeds (torch.FloatTensor, optional) – Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.
negative_prompt_embeds (torch.FloatTensor, optional) – Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument.
output_type (str, optional, defaults to “pil”) – The output format of the generate image. Choose between [PIL](https://pillow.readthedocs.io/en/stable/): PIL.Image.Image or np.array.
return_dict (bool, optional, defaults to True) – Whether or not to return a [~pipelines.stable_diffusion.StableDiffusionPipelineOutput] instead of a plain tuple.
callback (Callable, optional) – A function that will be called every callback_steps steps during inference. The function will be called with the following arguments: callback(step: int, timestep: int, latents: torch.FloatTensor).
callback_steps (int, optional, defaults to 1) – The frequency at which the callback function will be called. If not specified, the callback will be called at every step.
cross_attention_kwargs (dict, optional) – A kwargs dictionary that if specified is passed along to the AttentionProcessor as defined under self.processor in [diffusers.cross_attention](https://github.com/huggingface/ diffusers/blob/main/src/diffusers/models/cross_attention.py).
alpha (float, defaults to 0.7) – The ratio of subject conditioning. If alpha_ is 0.7, the beginning 30% of denoising steps use text prompts, while the last 70% utilize image-augmented prompts. Increase alpha for identity preservation, decrease it for prompt consistency.
reference_subject_images (List[PIL.Image.Image]) – a list of PIL images that are used as reference subjects. The number of images should be equal to the number of augmented tokens in the prompts.
augmented_prompt_embeds – (torch.FloatTensor, optional): Pre-generated image augmented text embeddings. If not provided, embeddings will be generated from prompt and reference_subject_images.
show_progress – (‘bool’): show progress or not.
Examples:
- 返回
OrderedDict if return_dict is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of `bool`s denoting whether the corresponding generated image likely represents “not-safe-for-work” (nsfw) content, according to the `safety_checker.
- 返回类型
OrderedDict or tuple
- class mmagic.models.editors.FBADecoder(pool_scales, in_channels, channels, conv_cfg=None, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU'), align_corners=False)¶
Bases:
torch.nn.Module
Decoder for FBA matting.
- 参数
pool_scales (tuple[int]) – Pooling scales used in
Module. (Pooling Pyramid) –
in_channels (int) – Input channels.
channels (int) – Channels after modules, before conv_seg.
conv_cfg (dict|None) – Config of conv layers.
norm_cfg (dict|None) – Config of norm layers.
act_cfg (dict) – Config of activation layers.
align_corners (bool) – align_corners argument of F.interpolate.
- init_weights(pretrained=None)¶
Init weights for the model.
- 参数
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
- forward(inputs)¶
Forward function.
- 参数
inputs (dict) – Output dict of FbaEncoder.
- 返回
Predicted alpha, fg and bg of the current batch.
- 返回类型
tuple(Tensor)
- class mmagic.models.editors.FBAResnetDilated(depth: int, in_channels: int = 3, stem_channels: int = 64, base_channels: int = 64, num_stages: int = 4, strides: Sequence[int] = (1, 2, 2, 2), dilations: Sequence[int] = (1, 1, 2, 4), deep_stem: bool = False, avg_down: bool = False, frozen_stages: int = - 1, act_cfg: dict = dict(type='ReLU'), conv_cfg: Optional[dict] = None, norm_cfg: dict = dict(type='BN'), with_cp: bool = False, multi_grid: Optional[Sequence[int]] = None, contract_dilation: bool = False, zero_init_residual: bool = True)¶
Bases:
mmagic.models.archs.ResNet
ResNet-based encoder for FBA image matting.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (N, C, H, W).
- 返回
Output tensor.
- 返回类型
Tensor
- class mmagic.models.editors.FLAVR(generator: dict, pixel_loss: dict, train_cfg: Optional[dict] = None, test_cfg: Optional[dict] = None, required_frames: int = 2, step_frames: int = 1, init_cfg: Optional[dict] = None, data_preprocessor: Optional[dict] = None)¶
Bases:
mmagic.models.base_models.BasicInterpolator
FLAVR model for video interpolation.
- Paper:
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
Ref repo: https://github.com/tarun005/FLAVR
- 参数
generator (dict) – Config for the generator structure.
pixel_loss (dict) – Config for pixel-wise loss.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
required_frames (int) – Required frames in each process. Default: 2
step_frames (int) – Step size of video frame interpolation. Default: 1
init_cfg (dict, optional) – The weight initialized config for
BaseModule
.data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.
- init_cfg¶
Initialization config dict.
- Type
dict, optional
- data_preprocessor¶
Used for pre-processing data sampled by dataloader to the format accepted by
forward()
.- Type
BaseDataPreprocessor
- static merge_frames(input_tensors, output_tensors)¶
merge input frames and output frames.
Interpolate a frame between the given two frames.
- Merged from
[[in1, in2, in3, in4], [in2, in3, in4, in5], …] [[out1], [out2], [out3], …]
- to
[in1, in2, out1, in3, out2, …, in(-3), out(-1), in(-2), in(-1)]
- 参数
input_tensors (Tensor) – The input frames with shape [n, 4, c, h, w]
output_tensors (Tensor) – The output frames with shape [n, 1, c, h, w].
- 返回
The final frames.
- 返回类型
list[np.array]
- class mmagic.models.editors.FLAVRNet(num_input_frames, num_output_frames, mid_channels_list=[512, 256, 128, 64], encoder_layers_list=[2, 2, 2, 2], bias=False, norm_cfg=None, join_type='concat', up_mode='transpose', init_cfg=None)¶
Bases:
mmengine.model.BaseModule
PyTorch implementation of FLAVR for video frame interpolation.
- Paper:
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
Ref repo: https://github.com/tarun005/FLAVR
- 参数
num_input_frames (int) – Number of input frames.
num_output_frames (int) – Number of output frames.
mid_channels_list (list[int]) – List of number of mid channels. Default: [512, 256, 128, 64]
encoder_layers_list (list[int]) – List of number of layers in encoder. Default: [2, 2, 2, 2]
bias (bool) – If
True
, adds a learnable bias to the conv layers. Default:True
norm_cfg (dict | None) – Config dict for normalization layer. Default: None
join_type (str) – Join type of tensors from decoder and encoder. Candidates are
concat
andadd
. Default:concat
up_mode (str) – Up-mode UpConv3d, candidates are
transpose
andtrilinear
. Default:transpose
init_cfg (dict, optional) – Initialization config dict. Default: None.
- forward(images: torch.Tensor)¶
Forward function.
- 参数
images (Tensor) – Input frames tensor with shape (N, T, C, H, W).
- 返回
Output tensor.
- 返回类型
out (Tensor)
- class mmagic.models.editors.GCA(data_preprocessor, backbone, loss_alpha=None, init_cfg: Optional[dict] = None, train_cfg=None, test_cfg=None)¶
Bases:
mmagic.models.base_models.BaseMattor
Guided Contextual Attention image matting model.
https://arxiv.org/abs/2001.04069
- 参数
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.backbone (dict) – Config of backbone.
loss_alpha (dict) – Config of the alpha prediction loss. Default: None.
init_cfg (dict, optional) – Initialization config dict. Default: None.
train_cfg (dict) – Config of training. In
train_cfg
,train_backbone
should be specified. If the model has a refiner,train_refiner
should be specified.test_cfg (dict) – Config of testing. In
test_cfg
, If the model has a refiner,train_refiner
should be specified.
- _forward(inputs)¶
Forward function.
- 参数
inputs (torch.Tensor) – Input tensor.
- 返回
Output tensor.
- 返回类型
Tensor
- _forward_test(inputs)¶
Forward function for testing GCA model.
- 参数
inputs (torch.Tensor) – batch input tensor.
- 返回
Output tensor of model.
- 返回类型
Tensor
- _forward_train(inputs, data_samples)¶
Forward function for training GCA model.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement]) – data samples collated by
data_preprocessor
.
- 返回
Contains the loss items and batch information.
- 返回类型
dict
- class mmagic.models.editors.GGAN(generator: ModelType, discriminator: Optional[ModelType] = None, data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, noise_size: Optional[int] = None, ema_config: Optional[Dict] = None, loss_config: Optional[Dict] = None)¶
Bases:
mmagic.models.base_models.BaseGAN
Implementation of Geometric GAN.
<https://arxiv.org/abs/1705.02894>`_(GGAN).
- disc_loss(disc_pred_fake: torch.Tensor, disc_pred_real: torch.Tensor) Tuple ¶
Get disc loss. GGAN use hinge loss to train the discriminator.
- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
disc_pred_real (Tensor) – Discriminator’s prediction of the real images.
- 返回
Loss value and a dict of log variables.
- 返回类型
tuple[Tensor, dict]
- gen_loss(disc_pred_fake)¶
Get disc loss. GGAN use hinge loss to train the generator.
- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
- 返回
Loss value and a dict of log variables.
- 返回类型
tuple[Tensor, dict]
- train_discriminator(inputs: dict, data_samples: List[mmagic.structures.DataSample], optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train discriminator.
- 参数
inputs (dict) – Inputs from dataloader.
data_samples (List[DataSample]) – Data samples from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- train_generator(inputs: dict, data_samples: List[mmagic.structures.DataSample], optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train generator.
- 参数
inputs (dict) – Inputs from dataloader.
data_samples (List[DataSample]) – Data samples from dataloader. Do not used in generator’s training.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- class mmagic.models.editors.GLEANStyleGANv2(in_size, out_size, img_channels=3, rrdb_channels=64, num_rrdbs=23, style_channels=512, num_mlps=8, channel_multiplier=2, blur_kernel=[1, 3, 3, 1], lr_mlp=0.01, default_style_mode='mix', eval_style_mode='single', mix_prob=0.9, init_cfg=None, fp16_enabled=False, bgr2rgb=False)¶
Bases:
mmengine.model.BaseModule
GLEAN (using StyleGANv2) architecture for super-resolution.
- Paper:
GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution, CVPR, 2021
This method makes use of StyleGAN2 and hence the arguments mostly follow that in ‘StyleGAN2v2Generator’.
In StyleGAN2, we use a static architecture composing of a style mapping module and number of convolutional style blocks. More details can be found in: Analyzing and Improving the Image Quality of StyleGAN CVPR2020.
You can load pretrained model through passing information into
pretrained
argument. We have already offered official weights as follows:stylegan2-ffhq-config-f: http://download.openmmlab.com/mmediting/stylegan2/official_weights/stylegan2-ffhq-config-f-official_20210327_171224-bce9310c.pth # noqa
stylegan2-horse-config-f: http://download.openmmlab.com/mmediting/stylegan2/official_weights/stylegan2-horse-config-f-official_20210327_173203-ef3e69ca.pth # noqa
stylegan2-car-config-f: http://download.openmmlab.com/mmediting/stylegan2/official_weights/stylegan2-car-config-f-official_20210327_172340-8cfe053c.pth # noqa
stylegan2-cat-config-f: http://download.openmmlab.com/mmediting/stylegan2/official_weights/stylegan2-cat-config-f-official_20210327_172444-15bc485b.pth # noqa
stylegan2-church-config-f: http://download.openmmlab.com/mmediting/stylegan2/official_weights/stylegan2-church-config-f-official_20210327_172657-1d42b7d1.pth # noqa
If you want to load the ema model, you can just use following codes:
# ckpt_http is one of the valid path from http source generator = StyleGANv2Generator(1024, 512, pretrained=dict( ckpt_path=ckpt_http, prefix='generator_ema'))
Of course, you can also download the checkpoint in advance and set
ckpt_path
with local path. If you just want to load the original generator (not the ema model), please set the prefix with ‘generator’.Note that our implementation allows to generate BGR image, while the original StyleGAN2 outputs RGB images by default. Thus, we provide
bgr2rgb
argument to convert the image space.- 参数
in_size (int) – The size of the input image.
out_size (int) – The output size of the StyleGAN2 generator.
img_channels (int) – Number of channels of the input images. 3 for RGB image and 1 for grayscale image. Default: 3.
rrdb_channels (int) – Number of channels of the RRDB features. Default: 64.
num_rrdbs (int) – Number of RRDB blocks in the encoder. Default: 23.
style_channels (int) – The number of channels for style code. Default: 512.
num_mlps (int, optional) – The number of MLP layers. Defaults to 8.
channel_multiplier (int, optional) – The multiplier factor for the channel number. Defaults to 2.
blur_kernel (list, optional) – The blurry kernel. Defaults to [1, 3, 3, 1].
lr_mlp (float, optional) – The learning rate for the style mapping layer. Defaults to 0.01.
default_style_mode (str, optional) – The default mode of style mixing. In training, we adopt mixing style mode in default. However, in the evaluation, we use ‘single’ style mode. [‘mix’, ‘single’] are currently supported. Defaults to ‘mix’.
eval_style_mode (str, optional) – The evaluation mode of style mixing. Defaults to ‘single’.
mix_prob (float, optional) – Mixing probability. The value should be in range of [0, 1]. Defaults to 0.9.
init_cfg (dict, optional) – Initialization config dict. Default: None.
fp16_enabled (bool, optional) – Whether to use fp16 training in this module. Defaults to False.
bgr2rgb (bool, optional) – Whether to flip the image channel dimension. Defaults to False.
- forward(lq)¶
Forward function.
- 参数
lq (Tensor) – Input LR image with shape (n, c, h, w).
- 返回
Output HR image.
- 返回类型
Tensor
- class mmagic.models.editors.GLDecoder(in_channels=256, norm_cfg=None, act_cfg=dict(type='ReLU'), out_act='clip')¶
Bases:
mmengine.model.BaseModule
Decoder used in Global&Local model.
This implementation follows: Globally and locally Consistent Image Completion
- 参数
in_channels (int) – Channel number of input feature.
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
out_act (str) – Output activation type, “clip” by default. Noted that in our implementation, we clip the output with range [-1, 1].
- forward(x)¶
Forward Function.
- 参数
x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
- 返回
Output tensor with shape of (n, c, h’, w’).
- 返回类型
torch.Tensor
- class mmagic.models.editors.GLDilationNeck(in_channels=256, conv_type='conv', norm_cfg=None, act_cfg=dict(type='ReLU'), **kwargs)¶
Bases:
mmengine.model.BaseModule
Dilation Backbone used in Global&Local model.
This implementation follows: Globally and locally Consistent Image Completion
- 参数
in_channels (int) – Channel number of input feature.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
kwargs (keyword arguments) –
- _conv_type¶
- forward(x)¶
Forward Function.
- 参数
x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
- 返回
Output tensor with shape of (n, c, h’, w’).
- 返回类型
torch.Tensor
- class mmagic.models.editors.GLEncoder(norm_cfg=None, act_cfg=dict(type='ReLU'))¶
Bases:
mmengine.model.BaseModule
Encoder used in Global&Local model.
This implementation follows: Globally and locally Consistent Image Completion
- 参数
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
- forward(x)¶
Forward Function.
- 参数
x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
- 返回
Output tensor with shape of (n, c, h’, w’).
- 返回类型
torch.Tensor
- class mmagic.models.editors.GLEncoderDecoder(encoder=dict(type='GLEncoder'), decoder=dict(type='GLDecoder'), dilation_neck=dict(type='GLDilationNeck'))¶
Bases:
mmengine.model.BaseModule
Encoder-Decoder used in Global&Local model.
This implementation follows: Globally and locally Consistent Image Completion
The architecture of the encoder-decoder is: (conv2d x 6) –> (dilated conv2d x 4) –> (conv2d or deconv2d x 7)
- 参数
encoder (dict) – Config dict to encoder.
decoder (dict) – Config dict to build decoder.
dilation_neck (dict) – Config dict to build dilation neck.
- forward(x)¶
Forward Function.
- 参数
x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
- 返回
Output tensor with shape of (n, c, h’, w’).
- 返回类型
torch.Tensor
- class mmagic.models.editors.AblatedDiffusionModel(data_preprocessor, unet, diffusion_scheduler, use_fp16=False, classifier=None, classifier_scale=1.0, rgb2bgr=False, pretrained_cfgs=None)¶
Bases:
mmengine.model.BaseModel
Guided diffusion Model.
- 参数
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.unet (ModelType) – Config of denoising Unet.
diffusion_scheduler (ModelType) – Config of diffusion_scheduler scheduler.
use_fp16 (bool) – Whether to use fp16 for unet model. Defaults to False.
classifier (ModelType) – Config of classifier. Defaults to None.
pretrained_cfgs (dict) – Path Config for pretrained weights. Usually this is a dict contains module name and the corresponding ckpt path.Defaults to None.
- property device¶
Get current device of the model.
- 返回
The current device of the model.
- 返回类型
torch.device
- load_pretrained_models(pretrained_cfgs)¶
_summary_
- 参数
pretrained_cfgs (_type_) – _description_
- infer(scheduler_kwargs=None, init_image=None, batch_size=1, num_inference_steps=1000, labels=None, classifier_scale=0.0, show_progress=False)¶
_summary_
- 参数
init_image (_type_, optional) – _description_. Defaults to None.
batch_size (int, optional) – _description_. Defaults to 1.
num_inference_steps (int, optional) – _description_. Defaults to 1000.
labels (_type_, optional) – _description_. Defaults to None.
show_progress (bool, optional) – _description_. Defaults to False.
- 返回
_description_
- 返回类型
_type_
- forward(inputs: mmagic.utils.typing.ForwardInputs, data_samples: Optional[list] = None, mode: Optional[str] = None) List[mmagic.structures.DataSample] ¶
_summary_
- 参数
inputs (ForwardInputs) – _description_
data_samples (Optional[list], optional) – _description_. Defaults to None.
mode (Optional[str], optional) – _description_. Defaults to None.
- 返回
_description_
- 返回类型
List[DataSample]
- val_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the generated image of given data.
Calls
self.data_preprocessor(data)
andself(inputs, data_sample, mode=None)
in order. Return the generated results which will be passed to evaluator.- 参数
data (dict) – Data sampled from metric specific sampler. More details in Metrics and Evaluator.
- 返回
Generated image or image dict.
- 返回类型
SampleList
- test_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the generated image of given data. Same as
val_step()
.- 参数
data (dict) – Data sampled from metric specific sampler. More details in Metrics and Evaluator.
- 返回
Generated image or image dict.
- 返回类型
List[DataSample]
- train_step(data: dict, optim_wrapper: mmengine.optim.OptimWrapperDict)¶
_summary_
- 参数
data (dict) – _description_
optim_wrapper (OptimWrapperDict) – _description_
- 返回
_description_
- 返回类型
_type_
- get_module(model: torch.nn.Module, module_name: str) torch.nn.Module ¶
Get an inner module from model.
Since we will wrapper DDP for some model, we have to judge whether the module can be indexed directly.
- 参数
model (nn.Module) – This model may wrapped with DDP or not.
module_name (str) – The name of specific module.
- 返回
Returned sub module.
- 返回类型
nn.Module
- class mmagic.models.editors.IconVSRNet(mid_channels=64, num_blocks=30, keyframe_stride=5, padding=2, spynet_pretrained=None, edvr_pretrained=None)¶
Bases:
mmengine.model.BaseModule
IconVSR network structure for video super-resolution.
Support only x4 upsampling.
- Paper:
BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond, CVPR, 2021
- 参数
mid_channels (int) – Channel number of the intermediate features. Default: 64.
num_blocks (int) – Number of residual blocks in each propagation branch. Default: 30.
keyframe_stride (int) – Number determining the keyframes. If stride=5, then the (0, 5, 10, 15, …)-th frame will be the keyframes. Default: 5.
padding (int) – Number of frames to be padded at two ends of the sequence. 2 for REDS and 3 for Vimeo-90K. Default: 2.
spynet_pretrained (str) – Pre-trained model path of SPyNet. Default: None.
edvr_pretrained (str) – Pre-trained model path of EDVR (for refill). Default: None.
- spatial_padding(lrs)¶
Apply padding spatially.
Since the PCD module in EDVR requires that the resolution is a multiple of 4, we apply padding to the input LR images if their resolution is not divisible by 4.
- 参数
lrs (Tensor) – Input LR sequence with shape (n, t, c, h, w).
- 返回
Padded LR sequence with shape (n, t, c, h_pad, w_pad).
- 返回类型
Tensor
- check_if_mirror_extended(lrs)¶
Check whether the input is a mirror-extended sequence.
If mirror-extended, the i-th (i=0, …, t-1) frame is equal to the (t-1-i)-th frame.
- 参数
lrs (tensor) – Input LR images with shape (n, t, c, h, w)
- compute_refill_features(lrs, keyframe_idx)¶
Compute keyframe features for information-refill.
Since EDVR-M is used, padding is performed before feature computation. :param lrs: Input LR images with shape (n, t, c, h, w) :type lrs: Tensor :param keyframe_idx: The indices specifying the keyframes. :type keyframe_idx: list(int)
- 返回
- The keyframe features. Each key corresponds to the
indices in keyframe_idx.
- 返回类型
dict(Tensor)
- compute_flow(lrs)¶
Compute optical flow using SPyNet for feature warping.
Note that if the input is an mirror-extended sequence, ‘flows_forward’ is not needed, since it is equal to ‘flows_backward.flip(1)’.
- 参数
lrs (tensor) – Input LR images with shape (n, t, c, h, w)
- 返回
- Optical flow. ‘flows_forward’ corresponds to the
flows used for forward-time propagation (current to previous). ‘flows_backward’ corresponds to the flows used for backward-time propagation (current to next).
- 返回类型
tuple(Tensor)
- forward(lrs)¶
Forward function for IconVSR.
- 参数
lrs (Tensor) – Input LR tensor with shape (n, t, c, h, w).
- 返回
Output HR tensor with shape (n, t, c, 4h, 4w).
- 返回类型
Tensor
- class mmagic.models.editors.DepthwiseIndexBlock(in_channels, norm_cfg=dict(type='BN'), use_context=False, use_nonlinear=False, mode='o2o')¶
Bases:
mmengine.model.BaseModule
Depthwise index block.
From https://arxiv.org/abs/1908.00672.
- 参数
in_channels (int) – Input channels of the holistic index block.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
use_context (bool, optional) – Whether use larger kernel size in index block. Refer to the paper for more information. Defaults to False.
use_nonlinear (bool) – Whether add a non-linear conv layer in the index blocks. Default: False.
mode (str) – Mode of index block. Should be ‘o2o’ or ‘m2o’. In ‘o2o’ mode, the group of the conv layers is 1; In ‘m2o’ mode, the group of the conv layer is in_channels.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input feature map with shape (N, C, H, W).
- 返回
Encoder index feature and decoder index feature.
- 返回类型
tuple(Tensor)
- class mmagic.models.editors.HolisticIndexBlock(in_channels, norm_cfg=dict(type='BN'), use_context=False, use_nonlinear=False)¶
Bases:
mmengine.model.BaseModule
Holistic Index Block.
From https://arxiv.org/abs/1908.00672.
- 参数
in_channels (int) – Input channels of the holistic index block.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
use_context (bool, optional) – Whether use larger kernel size in index block. Refer to the paper for more information. Defaults to False.
use_nonlinear (bool) – Whether add a non-linear conv layer in the index block. Default: False.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input feature map with shape (N, C, H, W).
- 返回
Encoder index feature and decoder index feature.
- 返回类型
tuple(Tensor)
- class mmagic.models.editors.IndexedUpsample(in_channels, out_channels, kernel_size=5, norm_cfg=dict(type='BN'), conv_module=ConvModule, init_cfg: Optional[dict] = None)¶
Bases:
mmengine.model.BaseModule
Indexed upsample module.
- 参数
in_channels (int) – Input channels.
out_channels (int) – Output channels.
kernel_size (int, optional) – Kernel size of the convolution layer. Defaults to 5.
norm_cfg (dict, optional) – Config dict for normalization layer. Defaults to dict(type=’BN’).
conv_module (ConvModule | DepthwiseSeparableConvModule, optional) – Conv module. Defaults to ConvModule.
init_cfg (dict, optional) – Initialization config dict. Default: None.
- init_weights()¶
Init weights for the module.
- forward(x, shortcut, dec_idx_feat=None)¶
Forward function.
- 参数
x (Tensor) – Input feature map with shape (N, C, H, W).
shortcut (Tensor) – The shortcut connection with shape (N, C, H’, W’).
dec_idx_feat (Tensor, optional) – The decode index feature map with shape (N, C, H’, W’). Defaults to None.
- 返回
Output tensor with shape (N, C, H’, W’).
- 返回类型
Tensor
- class mmagic.models.editors.IndexNet(data_preprocessor, backbone, loss_alpha=None, loss_comp=None, init_cfg=None, train_cfg=None, test_cfg=None)¶
Bases:
mmagic.models.base_models.BaseMattor
IndexNet matting model.
This implementation follows: Indices Matter: Learning to Index for Deep Image Matting
- 参数
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.backbone (dict) – Config of backbone.
train_cfg (dict) – Config of training. In ‘train_cfg’, ‘train_backbone’ should be specified.
test_cfg (dict) – Config of testing.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
.loss_alpha (dict) – Config of the alpha prediction loss. Default: None.
loss_comp (dict) – Config of the composition loss. Default: None.
- _forward(inputs)¶
Forward function.
- 参数
inputs (torch.Tensor) – Input tensor.
- 返回
Output tensor.
- 返回类型
Tensor
- _forward_test(inputs)¶
Forward function for testing IndexNet model.
- 参数
inputs (torch.Tensor) – batch input tensor.
- 返回
Output tensor of model.
- 返回类型
Tensor
- _forward_train(inputs, data_samples)¶
Forward function for training IndexNet model.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement]) – data samples collated by
data_preprocessor
.
- 返回
Contains the loss items and batch information.
- 返回类型
dict
- class mmagic.models.editors.IndexNetDecoder(in_channels, kernel_size=5, norm_cfg=dict(type='BN'), separable_conv=False, init_cfg: Optional[dict] = None)¶
Bases:
mmengine.model.BaseModule
Decoder for IndexNet.
Please refer to https://arxiv.org/abs/1908.00672.
- 参数
in_channels (int) – Input channels of the decoder.
kernel_size (int, optional) – Kernel size of the convolution layer. Defaults to 5.
norm_cfg (None | dict, optional) – Config dict for normalization layer. Defaults to dict(type=’BN’).
separable_conv (bool) – Whether to use separable conv. Default: False.
init_cfg (dict, optional) – Initialization config dict. Default: None.
- init_weights()¶
Init weights for the module.
- forward(inputs)¶
Forward function.
- 参数
inputs (dict) – Output dict of IndexNetEncoder.
- 返回
Predicted alpha matte of the current batch.
- 返回类型
Tensor
- class mmagic.models.editors.IndexNetEncoder(in_channels, out_stride=32, width_mult=1, index_mode='m2o', aspp=True, norm_cfg=dict(type='BN'), freeze_bn=False, use_nonlinear=True, use_context=True, init_cfg: Optional[dict] = None)¶
Bases:
mmengine.model.BaseModule
Encoder for IndexNet.
Please refer to https://arxiv.org/abs/1908.00672.
- 参数
in_channels (int, optional) – Input channels of the encoder.
out_stride (int, optional) – Output stride of the encoder. For example, if out_stride is 32, the input feature map or image will be downsample to the 1/32 of original size. Defaults to 32.
width_mult (int, optional) – Width multiplication factor of channel dimension in MobileNetV2. Defaults to 1.
index_mode (str, optional) – Index mode of the index network. It must be one of {‘holistic’, ‘o2o’, ‘m2o’}. If it is set to ‘holistic’, then Holistic index network will be used as the index network. If it is set to ‘o2o’ (or ‘m2o’), when O2O (or M2O) Depthwise index network will be used as the index network. Defaults to ‘m2o’.
aspp (bool, optional) – Whether use ASPP module to augment output feature. Defaults to True.
norm_cfg (None | dict, optional) – Config dict for normalization layer. Defaults to dict(type=’BN’).
freeze_bn (bool, optional) – Whether freeze batch norm layer. Defaults to False.
use_nonlinear (bool, optional) – Whether use nonlinearity in index network. Refer to the paper for more information. Defaults to True.
use_context (bool, optional) – Whether use larger kernel size in index network. Refer to the paper for more information. Defaults to True.
init_cfg (dict, optional) – Initialization config dict. Default: None.
- 引发
ValueError – out_stride must 16 or 32.
NameError – Supported index_mode are {‘holistic’, ‘o2o’, ‘m2o’}.
- _make_layer(layer_setting, norm_cfg)¶
- train(mode=True)¶
Set BatchNorm modules in the model to evaluation mode.
- init_weights()¶
Init weights for the model.
Initialization is based on self._init_cfg
- 参数
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input feature map with shape (N, C, H, W).
- 返回
Output tensor, shortcut feature and decoder index feature.
- 返回类型
dict
- class mmagic.models.editors.InstColorization(data_preprocessor: Union[dict, mmengine.config.Config], image_model, instance_model, fusion_model, color_data_opt, which_direction='AtoB', loss=None, init_cfg=None, train_cfg=None, test_cfg=None)¶
Bases:
mmengine.model.BaseModel
Colorization InstColorization method.
- This Colorization is implemented according to the paper:
Instance-aware Image Colorization, CVPR 2020
Adapted from ‘https://github.com/ericsujw/InstColorization.git’ ‘InstColorization/models/train_model’ Copyright (c) 2020, Su, under MIT License.
- 参数
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.image_model (dict) – Config for single image model
instance_model (dict) – Config for instance model
fusion_model (dict) – Config for fusion model
color_data_opt (dict) – Option for colorspace conversion
which_direction (str) – AtoB or BtoA
loss (dict) – Config for loss.
init_cfg (str) – Initialization config dict. Default: None.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
- forward(inputs: torch.Tensor, data_samples: Optional[List[mmagic.structures.DataSample]] = None, mode: str = 'tensor', **kwargs)¶
Returns losses or predictions of training, validation, testing, and simple inference process.
forward
method of BaseModel is an abstract method, its subclasses must implement this method.Accepts
inputs
anddata_samples
processed bydata_preprocessor
, and returns results according to mode arguments.During non-distributed training, validation, and testing process,
forward
will be called byBaseModel.train_step
,BaseModel.val_step
andBaseModel.val_step
directly.During distributed data parallel training process,
MMSeparateDistributedDataParallel.train_step
will first callDistributedDataParallel.forward
to enable automatic gradient synchronization, and then callforward
to get training loss.- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.mode (str) –
mode should be one of
loss
,predict
andtensor
. Default: ‘tensor’.loss
: Called bytrain_step
and return lossdict
used for loggingpredict
: Called byval_step
andtest_step
and return list ofBaseDataElement
results used for computing metric.tensor
: Called by custom use to getTensor
type results.
- 返回
If
mode == loss
, return adict
of loss tensor used for backward and logging.If
mode == predict
, return alist
ofBaseDataElement
for computing metric and getting inference result.If
mode == tensor
, return a tensor ortuple
of tensor ordict
or tensor for custom use.
- 返回类型
ForwardResults
- convert_to_datasample(inputs, data_samples)¶
Add predictions and destructed inputs (if passed) to data samples.
- 参数
inputs (Optional[torch.Tensor]) – The input of model. Defaults to None.
data_samples (List[DataSample]) – The data samples loaded from dataloader.
- 返回
Modified data samples.
- 返回类型
List[DataSample]
- abstract forward_train(inputs, data_samples=None, **kwargs)¶
Forward function for training.
- abstract train_step(data: List[dict], optim_wrapper: mmengine.optim.OptimWrapperDict) Dict[str, torch.Tensor] ¶
Train step function.
- 参数
data (List[dict]) – Batch of data as input.
optim_wrapper (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).
- 返回
- Dict with loss, information for logger, the number of
samples and results for visualization.
- 返回类型
dict
- forward_inference(inputs, data_samples=None, **kwargs)¶
Forward inference. Returns predictions of validation, testing.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.
- 返回
predictions.
- 返回类型
List[DataSample]
- forward_tensor(inputs, data_samples)¶
Forward function in tensor mode.
- 参数
inputs (torch.Tensor) – Input tensor.
data_sample (dict) – Dict contains data sample.
- 返回
Dict contains output results.
- 返回类型
dict
- class mmagic.models.editors.LIIF(generator: dict, pixel_loss: dict, train_cfg: Optional[dict] = None, test_cfg: Optional[dict] = None, init_cfg: Optional[dict] = None, data_preprocessor: Optional[dict] = None)¶
Bases:
mmagic.models.base_models.BaseEditModel
LIIF model for single image super-resolution.
- Paper: Learning Continuous Image Representation with
Local Implicit Image Function
- 参数
generator (dict) – Config for the generator.
pixel_loss (dict) – Config for the pixel loss.
pretrained (str) – Path for pretrained model. Default: None.
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.
- forward_tensor(inputs, data_samples=None, **kwargs)¶
Forward tensor. Returns result of simple forward.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.
- 返回
result of simple forward.
- 返回类型
Tensor
- forward_inference(inputs, data_samples=None, **kwargs)¶
Forward inference. Returns predictions of validation, testing, and simple inference.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (BaseDataElement, optional) – data samples collated by
data_preprocessor
.
- 返回
predictions.
- 返回类型
List[DataSample]
- class mmagic.models.editors.MLPRefiner(in_dim, out_dim, hidden_list)¶
Bases:
mmengine.model.BaseModule
Multilayer perceptrons (MLPs), refiner used in LIIF.
- 参数
in_dim (int) – Input dimension.
out_dim (int) – Output dimension.
hidden_list (list[int]) – List of hidden dimensions.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – The input of MLP.
- 返回
The output of MLP.
- 返回类型
Tensor
- class mmagic.models.editors.LSGAN(generator: ModelType, discriminator: Optional[ModelType] = None, data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, noise_size: Optional[int] = None, ema_config: Optional[Dict] = None, loss_config: Optional[Dict] = None)¶
Bases:
mmagic.models.base_models.BaseGAN
Implementation of Least Squares Generative Adversarial Networks.
Paper link: https://arxiv.org/pdf/1611.04076.pdf
Detailed architecture can be found in
LSGANGenerator
andLSGANDiscriminator
- disc_loss(disc_pred_fake: torch.Tensor, disc_pred_real: torch.Tensor) Tuple ¶
Get disc loss. LSGAN use the least squares loss to train the discriminator.
\[L_{D}=\left(D\left(X_{\text {data }}\right)-1\right)^{2} +(D(G(z)))^{2}\]- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
disc_pred_real (Tensor) – Discriminator’s prediction of the real images.
- 返回
Loss value and a dict of log variables.
- 返回类型
tuple[Tensor, dict]
- gen_loss(disc_pred_fake: torch.Tensor) Tuple ¶
Get gen loss. LSGAN use the least squares loss to train the generator.
\[L_{G}=(D(G(z))-1)^{2}\]- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
- 返回
Loss value and a dict of log variables.
- 返回类型
tuple[Tensor, dict]
- train_discriminator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train discriminator.
- 参数
inputs (dict) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- train_generator(inputs: dict, data_samples: List[mmagic.structures.DataSample], optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train generator.
- 参数
inputs (dict) – Inputs from dataloader.
data_samples (List[DataSample]) – Data samples from dataloader. Do not used in generator’s training.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- class mmagic.models.editors.MSPIEStyleGAN2(*args, train_settings=dict(), **kwargs)¶
Bases:
mmagic.models.editors.stylegan2.StyleGAN2
MS-PIE StyleGAN2.
In this GAN, we adopt the MS-PIE training schedule so that multi-scale images can be generated with a single generator. Details can be found in: Positional Encoding as Spatial Inductive Bias in GANs, CVPR2021.
- 参数
train_settings (dict) – Config for training settings. Defaults to dict().
- train_step(data: dict, optim_wrapper: mmengine.optim.OptimWrapperDict) Dict[str, torch.Tensor] ¶
Train GAN model. In the training of GAN models, generator and discriminator are updated alternatively. In MMagic’s design, self.train_step is called with data input. Therefore we always update discriminator, whose updating is relay on real data, and then determine if the generator needs to be updated based on the current number of iterations. More details about whether to update generator can be found in
should_gen_update()
.- 参数
data (dict) – Data sampled from dataloader.
optim_wrapper (OptimWrapperDict) – OptimWrapperDict instance contains OptimWrapper of generator and discriminator.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, torch.Tensor]
- train_generator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train generator.
- 参数
inputs (TrainInput) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader. Do not used in generator’s training.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- train_discriminator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train discriminator.
- 参数
inputs (TrainInput) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- class mmagic.models.editors.PESinGAN(generator: ModelType, discriminator: Optional[ModelType], data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, num_scales: Optional[int] = None, fixed_noise_with_pad: bool = False, first_fixed_noises_ch: int = 1, iters_per_scale: int = 200, noise_weight_init: int = 0.1, lr_scheduler_args: Optional[dict] = None, test_pkl_data: Optional[str] = None, ema_confg: Optional[dict] = None)¶
Bases:
mmagic.models.editors.singan.SinGAN
Positional Encoding in SinGAN.
This modified SinGAN is used to reimplement the experiments in: Positional Encoding as Spatial Inductive Bias in GANs, CVPR2021.
- construct_fixed_noises()¶
Construct the fixed noises list used in SinGAN.
- class mmagic.models.editors.NAFBaseline(img_channel=3, mid_channels=16, middle_blk_num=1, enc_blk_nums=[1, 1, 1, 28], dec_blk_nums=[1, 1, 1, 1], dw_expand=1, ffn_expand=2)¶
Bases:
mmengine.model.BaseModule
The original version of Baseline model in “Simple Baseline for Image Restoration”.
- 参数
img_channels (int) – Channel number of inputs.
mid_channels (int) – Channel number of intermediate features.
middle_blk_num (int) – Number of middle blocks.
enc_blk_nums (List of int) – Number of blocks for each encoder.
dec_blk_nums (List of int) – Number of blocks for each decoder.
- forward(inp)¶
Forward function.
- 参数
inp – input tensor image with (B, C, H, W) shape
- check_image_size(x)¶
Check image size and pad images so that it has enough dimension do downsample.
- 参数
x – input tensor image with (B, C, H, W) shape.
- class mmagic.models.editors.NAFBaselineLocal(*args, train_size=(1, 3, 256, 256), fast_imp=False, **kwargs)¶
Bases:
mmagic.models.editors.nafnet.naf_avgpool2d.Local_Base
,NAFBaseline
The original version of Baseline model in “Simple Baseline for Image Restoration”.
- 参数
img_channels (int) – Channel number of inputs.
mid_channels (int) – Channel number of intermediate features.
middle_blk_num (int) – Number of middle blocks.
enc_blk_nums (List of int) – Number of blocks for each encoder.
dec_blk_nums (L`ist of int) – Number of blocks for each decoder.
- class mmagic.models.editors.NAFNet(img_channels=3, mid_channels=16, middle_blk_num=1, enc_blk_nums=[], dec_blk_nums=[])¶
Bases:
mmengine.model.BaseModule
NAFNet.
The original version of NAFNet in “Simple Baseline for Image Restoration”.
- 参数
img_channels (int) – Channel number of inputs.
mid_channels (int) – Channel number of intermediate features.
middle_blk_num (int) – Number of middle blocks.
enc_blk_nums (List of int) – Number of blocks for each encoder.
dec_blk_nums (List of int) – Number of blocks for each decoder.
- forward(inp)¶
Forward function.
- 参数
inp – input tensor image with (B, C, H, W) shape
- check_image_size(x)¶
Check image size and pad images so that it has enough dimension do downsample.
- 参数
x – input tensor image with (B, C, H, W) shape.
- class mmagic.models.editors.NAFNetLocal(*args, train_size=(1, 3, 256, 256), fast_imp=False, **kwargs)¶
Bases:
mmagic.models.editors.nafnet.naf_avgpool2d.Local_Base
,NAFNet
The original version of NAFNetLocal in “Simple Baseline for Image Restoration”.
NAFNetLocal uses local average pooling modules than NAFNet.
- 参数
img_channels (int) – Channel number of inputs.
mid_channels (int) – Channel number of intermediate features.
middle_blk_num (int) – Number of middle blocks.
enc_blk_nums (List of int) – Number of blocks for each encoder.
dec_blk_nums (List of int) – Number of blocks for each decoder.
- class mmagic.models.editors.MaskConvModule(*args, **kwargs)¶
Bases:
mmcv.cnn.ConvModule
Mask convolution module.
This is a simple wrapper for mask convolution like: ‘partial conv’. Convolutions in this module always need a mask as extra input.
- 参数
in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int or tuple[int]) – Same as nn.Conv2d.
padding (int or tuple[int]) – Same as nn.Conv2d.
dilation (int or tuple[int]) – Same as nn.Conv2d.
groups (int) – Same as nn.Conv2d.
bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.
conv_cfg (dict) – Config dict for convolution layer.
norm_cfg (dict) – Config dict for normalization layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
inplace (bool) – Whether to use inplace mode for activation.
with_spectral_norm (bool) – Whether use spectral norm in conv module.
padding_mode (str) – If the padding_mode has not been supported by current Conv2d in Pytorch, we will use our own padding layer instead. Currently, we support [‘zeros’, ‘circular’] with official implementation and [‘reflect’] with our own implementation. Default: ‘zeros’.
order (tuple[str]) – The order of conv/norm/activation layers. It is a sequence of “conv”, “norm” and “act”. Examples are (“conv”, “norm”, “act”) and (“act”, “conv”, “norm”).
- supported_conv_list = ['PConv']¶
- forward(x, mask=None, activate=True, norm=True, return_mask=True)¶
Forward function for partial conv2d.
- 参数
x (torch.Tensor) – Tensor with shape of (n, c, h, w).
mask (torch.Tensor) – Tensor with shape of (n, c, h, w) or (n, 1, h, w). If mask is not given, the function will work as standard conv2d. Default: None.
activate (bool) – Whether use activation layer.
norm (bool) – Whether use norm layer.
return_mask (bool) – If True and mask is not None, the updated mask will be returned. Default: True.
- 返回
Result Tensor or 2-tuple of
Tensor
: Results after partial conv.Tensor
: Updated mask will be returned if mask is given and return_mask is True.- 返回类型
Tensor or tuple
- class mmagic.models.editors.PartialConv2d(*args, multi_channel=False, eps=1e-08, **kwargs)¶
Bases:
torch.nn.Conv2d
Implementation for partial convolution.
Image Inpainting for Irregular Holes Using Partial Convolutions [https://arxiv.org/abs/1804.07723]
- 参数
multi_channel (bool) – If True, the mask is multi-channel. Otherwise, the mask is single-channel.
eps (float) – Need to be changed for mixed precision training. For mixed precision training, you need change 1e-8 to 1e-6.
- forward(input, mask=None, return_mask=True)¶
Forward function for partial conv2d.
- 参数
input (torch.Tensor) – Tensor with shape of (n, c, h, w).
mask (torch.Tensor) – Tensor with shape of (n, c, h, w) or (n, 1, h, w). If mask is not given, the function will work as standard conv2d. Default: None.
return_mask (bool) – If True and mask is not None, the updated mask will be returned. Default: True.
- 返回
Results after partial conv. torch.Tensor : Updated mask will be returned if mask is given and
return_mask
is True.- 返回类型
torch.Tensor
- class mmagic.models.editors.PConvDecoder(num_layers=7, interpolation='nearest', conv_cfg=dict(type='PConv', multi_channel=True), norm_cfg=dict(type='BN'))¶
Bases:
mmengine.model.BaseModule
Decoder with partial conv.
About the details for this architecture, pls see: Image Inpainting for Irregular Holes Using Partial Convolutions
- 参数
num_layers (int) – The number of convolutional layers. Default: 7.
interpolation (str) – The upsample mode. Default: ‘nearest’.
conv_cfg (dict) – Config for convolution module. Default: {‘type’: ‘PConv’, ‘multi_channel’: True}.
norm_cfg (dict) – Config for norm layer. Default: {‘type’: ‘BN’}.
- forward(input_dict)¶
Forward Function.
- 参数
input_dict (dict | torch.Tensor) – Input dict with middle features or torch.Tensor.
- 返回
Output tensor with shape of (n, c, h, w).
- 返回类型
torch.Tensor
- class mmagic.models.editors.PConvEncoder(in_channels=3, num_layers=7, conv_cfg=dict(type='PConv', multi_channel=True), norm_cfg=dict(type='BN', requires_grad=True), norm_eval=False)¶
Bases:
mmengine.model.BaseModule
Encoder with partial conv.
About the details for this architecture, pls see: Image Inpainting for Irregular Holes Using Partial Convolutions
- 参数
in_channels (int) – The number of input channels. Default: 3.
num_layers (int) – The number of convolutional layers. Default: 7.
conv_cfg (dict) – Config for convolution module. Default: {‘type’: ‘PConv’, ‘multi_channel’: True}.
norm_cfg (dict) – Config for norm layer. Default: {‘type’: ‘BN’}.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effective on Batch Norm and its variants only. Default: False.
- train(mode=True)¶
Set BatchNorm modules in the model to evaluation mode.
- forward(x, mask)¶
Forward function for partial conv encoder.
- 参数
x (torch.Tensor) – Masked image with shape (n, c, h, w).
mask (torch.Tensor) – Mask tensor with shape (n, c, h, w).
- 返回
Contains the results and middle level features in this module. hidden_feats contain the middle feature maps and hidden_masks store updated masks.
- 返回类型
dict
- class mmagic.models.editors.PConvEncoderDecoder(encoder, decoder)¶
Bases:
mmengine.model.BaseModule
Encoder-Decoder with partial conv module.
- 参数
encoder (dict) – Config of the encoder.
decoder (dict) – Config of the decoder.
- forward(x, mask_in)¶
Forward Function.
- 参数
x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
mask_in (torch.Tensor) – Input tensor with shape of (n, c, h, w).
- 返回
Output tensor with shape of (n, c, h’, w’).
- 返回类型
torch.Tensor
- class mmagic.models.editors.PConvInpaintor(data_preprocessor: Union[dict, mmengine.config.Config], encdec: dict, disc: Optional[dict] = None, loss_gan: Optional[dict] = None, loss_gp: Optional[dict] = None, loss_disc_shift: Optional[dict] = None, loss_composed_percep: Optional[dict] = None, loss_out_percep: bool = False, loss_l1_hole: Optional[dict] = None, loss_l1_valid: Optional[dict] = None, loss_tv: Optional[dict] = None, train_cfg: Optional[dict] = None, test_cfg: Optional[dict] = None, init_cfg: Optional[dict] = None)¶
Bases:
mmagic.models.base_models.OneStageInpaintor
Inpaintor for Partial Convolution method.
This inpaintor is implemented according to the paper: Image inpainting for irregular holes using partial convolutions
- forward_tensor(inputs, data_samples)¶
Forward function in tensor mode.
- 参数
inputs (torch.Tensor) – Input tensor.
data_sample (dict) – Dict contains data sample.
- 返回
Dict contains output results.
- 返回类型
dict
- train_step(data: List[dict], optim_wrapper)¶
Train step function.
In this function, the inpaintor will finish the train step following the pipeline:
get fake res/image
optimize discriminator (if have)
optimize generator
If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing generator after disc_step iterations for discriminator.
- 参数
data (List[dict]) – Batch of data as input.
optim_wrapper (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).
- 返回
Dict with loss, information for logger, the number of samples and results for visualization.
- 返回类型
dict
- class mmagic.models.editors.ProgressiveGrowingGAN(generator, discriminator, data_preprocessor, nkimgs_per_scale, noise_size=None, interp_real=None, transition_kimgs: int = 600, prev_stage: int = 0, ema_config: Optional[Dict] = None)¶
Bases:
mmagic.models.base_models.BaseGAN
Progressive Growing Unconditional GAN.
In this GAN model, we implement progressive growing training schedule, which is proposed in Progressive Growing of GANs for improved Quality, Stability and Variation, ICLR 2018.
We highly recommend to use
GrowScaleImgDataset
for saving computational load in data pre-processing.Notes for using PGGAN:
In official implementation, Tero uses gradient penalty with
norm_mode="HWC"
We do not implement
minibatch_repeats
where has been used in official Tensorflow implementation.
Notes for resuming progressive growing GANs: Users should specify the
prev_stage
intrain_cfg
. Otherwise, the model is possible to reset the optimizer status, which will bring inferior performance. For example, if your model is resumed from the 256 stage, you should settrain_cfg=dict(prev_stage=256)
.- 参数
generator (dict) – Config for generator.
discriminator (dict) – Config for discriminator.
- forward(inputs: mmagic.utils.typing.ForwardInputs, data_samples: Optional[list] = None, mode: Optional[str] = None) mmagic.utils.typing.SampleList ¶
Sample images from noises by using the generator.
- 参数
batch_inputs (ForwardInputs) – Dict containing the necessary information (e.g. noise, num_batches, mode) to generate image.
data_samples (Optional[list]) – Data samples collated by
data_preprocessor
. Defaults to None.mode (Optional[str]) – mode is not used in
ProgressiveGrowingGAN
. Defaults to None.
- 返回
A list of
DataSample
contain generated results.- 返回类型
SampleList
- train_discriminator(inputs: torch.Tensor, data_samples: List[mmagic.structures.DataSample], optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train discriminator.
- 参数
inputs (Tensor) – Inputs from current resolution training.
data_samples (List[DataSample]) – Data samples from dataloader. Do not used in generator’s training.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- disc_loss(disc_pred_fake: torch.Tensor, disc_pred_real: torch.Tensor, fake_data: torch.Tensor, real_data: torch.Tensor) Tuple[torch.Tensor, dict] ¶
Get disc loss. PGGAN use WGAN-GP’s loss and discriminator shift loss to train the discriminator.
- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
disc_pred_real (Tensor) – Discriminator’s prediction of the real images.
fake_data (Tensor) – Generated images, used to calculate gradient penalty.
real_data (Tensor) – Real images, used to calculate gradient penalty.
- 返回
Loss value and a dict of log variables.
- 返回类型
Tuple[Tensor, dict]
- train_generator(inputs: torch.Tensor, data_samples: List[mmagic.structures.DataSample], optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train generator.
- 参数
inputs (Tensor) – Inputs from current resolution training.
data_samples (List[DataSample]) – Data samples from dataloader. Do not used in generator’s training.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- gen_loss(disc_pred_fake: torch.Tensor) Tuple[torch.Tensor, dict] ¶
Generator loss for PGGAN. PGGAN use WGAN’s loss to train the generator.
- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
recon_imgs (Tensor) – Reconstructive images.
- 返回
Loss value and a dict of log variables.
- 返回类型
Tuple[Tensor, dict]
- train_step(data: dict, optim_wrapper: mmengine.optim.OptimWrapperDict)¶
Train step function.
This function implements the standard training iteration for asynchronous adversarial training. Namely, in each iteration, we first update discriminator and then compute loss for generator with the newly updated discriminator.
As for distributed training, we use the
reducer
from ddp to synchronize the necessary params in current computational graph.- 参数
data_batch (dict) – Input data from dataloader.
optimizer (dict) – Dict contains optimizer for generator and discriminator.
ddp_reducer (
Reducer
| None, optional) – Reducer from ddp. It is used to prepare forbackward()
in ddp. Defaults to None.running_status (dict | None, optional) – Contains necessary basic information for training, e.g., iteration number. Defaults to None.
- 返回
Contains ‘log_vars’, ‘num_samples’, and ‘results’.
- 返回类型
dict
- class mmagic.models.editors.Pix2Pix(*args, **kwargs)¶
Bases:
mmagic.models.base_models.BaseTranslationModel
Pix2Pix model for paired image-to-image translation.
- Ref:
Image-to-Image Translation with Conditional Adversarial Networks
- forward_test(img, target_domain, **kwargs)¶
Forward function for testing.
- 参数
img (tensor) – Input image tensor.
target_domain (str) – Target domain of output image.
kwargs (dict) – Other arguments.
- 返回
Forward results.
- 返回类型
dict
- _get_disc_loss(outputs)¶
Get the loss of discriminator.
- 参数
outputs (dict) – A dict of output.
- 返回
Loss and a dict of log of loss terms.
- 返回类型
Tuple
- _get_gen_loss(outputs)¶
Get the loss of generator.
- 参数
outputs (dict) – A dict of output.
- 返回
Loss and a dict of log of loss terms.
- 返回类型
Tuple
- train_step(data, optim_wrapper=None)¶
Training step function.
- 参数
data_batch (dict) – Dict of the input data batch.
optimizer (dict[torch.optim.Optimizer]) – Dict of optimizers for the generator and discriminator.
ddp_reducer (
Reducer
| None, optional) – Reducer from ddp. It is used to prepare forbackward()
in ddp. Defaults to None.running_status (dict | None, optional) – Contains necessary basic information for training, e.g., iteration number. Defaults to None.
- 返回
Dict of loss, information for logger, the number of samples and results for visualization.
- 返回类型
dict
- test_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the generated image of given data. Same as
val_step()
.- 参数
data (dict) – Data sampled from metric specific sampler. More details in Metrics and Evaluator.
- 返回
Generated image or image dict.
- 返回类型
List[DataSample]
- val_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the generated image of given data. Same as
val_step()
.- 参数
data (dict) – Data sampled from metric specific sampler. More details in Metrics and Evaluator.
- 返回
Generated image or image dict.
- 返回类型
List[DataSample]
- class mmagic.models.editors.PlainDecoder(in_channels, init_cfg: Optional[dict] = None)¶
Bases:
mmengine.model.BaseModule
Simple decoder from Deep Image Matting.
- 参数
in_channels (int) – Channel num of input features.
init_cfg (dict, optional) – Initialization config dict. Default: None.
- init_weights()¶
Init weights for the module.
- forward(inputs)¶
Forward function of PlainDecoder.
- 参数
inputs (dict) –
Output dictionary of the VGG encoder containing:
out (Tensor): Output of the VGG encoder.
max_idx_1 (Tensor): Index of the first maxpooling layer in the VGG encoder.
max_idx_2 (Tensor): Index of the second maxpooling layer in the VGG encoder.
max_idx_3 (Tensor): Index of the third maxpooling layer in the VGG encoder.
max_idx_4 (Tensor): Index of the fourth maxpooling layer in the VGG encoder.
max_idx_5 (Tensor): Index of the fifth maxpooling layer in the VGG encoder.
- 返回
Output tensor.
- 返回类型
Tensor
- class mmagic.models.editors.PlainRefiner(conv_channels=64, init_cfg=None)¶
Bases:
mmengine.model.BaseModule
Simple refiner from Deep Image Matting.
- 参数
conv_channels (int) – Number of channels produced by the three main convolutional layer. Default: 64.
pretrained (str) – Name of pretrained model. Default: None.
- init_weights()¶
Init weights for the module.
- forward(x, raw_alpha)¶
Forward function.
- 参数
x (Tensor) – The input feature map of refiner.
raw_alpha (Tensor) – The raw predicted alpha matte.
- 返回
The refined alpha matte.
- 返回类型
Tensor
- class mmagic.models.editors.RDNNet(in_channels, out_channels, mid_channels=64, num_blocks=16, upscale_factor=4, num_layers=8, channel_growth=64)¶
Bases:
mmengine.model.BaseModule
RDN model for single image super-resolution.
Paper: Residual Dense Network for Image Super-Resolution
Adapted from ‘https://github.com/yjn870/RDN-pytorch.git’ ‘RDN-pytorch/blob/master/models.py’ Copyright (c) 2021, JaeYun Yeo, under MIT License.
Most of the implementation follows the implementation in: ‘https://github.com/sanghyun-son/EDSR-PyTorch.git’ ‘EDSR-PyTorch/blob/master/src/model/rdn.py’ Copyright (c) 2017, sanghyun-son, under MIT license.
- 参数
in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64.
num_blocks (int) – Block number in the trunk network. Default: 16.
upscale_factor (int) – Upsampling factor. Support 2^n and 3. Default: 4.
num_layer (int) – Layer number in the Residual Dense Block. Default: 8.
channel_growth (int) – Channels growth in each layer of RDB. Default: 64.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (n, c, h, w).
- 返回
Forward results.
- 返回类型
Tensor
- class mmagic.models.editors.RealBasicVSR(generator, discriminator=None, gan_loss=None, pixel_loss=None, cleaning_loss=None, perceptual_loss=None, is_use_sharpened_gt_in_pixel=False, is_use_sharpened_gt_in_percep=False, is_use_sharpened_gt_in_gan=False, is_use_ema=False, train_cfg=None, test_cfg=None, init_cfg=None, data_preprocessor=None)¶
Bases:
mmagic.models.editors.real_esrgan.RealESRGAN
RealBasicVSR model for real-world video super-resolution.
Ref: Investigating Tradeoffs in Real-World Video Super-Resolution, arXiv
- 参数
generator (dict) – Config for the generator.
discriminator (dict, optional) – Config for the discriminator. Default: None.
gan_loss (dict, optional) – Config for the gan loss. Note that the loss weight in gan loss is only for the generator.
pixel_loss (dict, optional) – Config for the pixel loss. Default: None.
cleaning_loss (dict, optional) – Config for the image cleaning loss. Default: None.
perceptual_loss (dict, optional) – Config for the perceptual loss. Default: None.
is_use_sharpened_gt_in_pixel (bool, optional) – Whether to use the image sharpened by unsharp masking as the GT for pixel loss. Default: False.
is_use_sharpened_gt_in_percep (bool, optional) – Whether to use the image sharpened by unsharp masking as the GT for perceptual loss. Default: False.
is_use_sharpened_gt_in_gan (bool, optional) – Whether to use the image sharpened by unsharp masking as the GT for adversarial loss. Default: False.
train_cfg (dict) – Config for training. Default: None. You may change the training of gan by setting: disc_steps: how many discriminator updates after one generate update; disc_init_steps: how many discriminator updates at the start of the training. These two keys are useful when training with WGAN.
test_cfg (dict) – Config for testing. Default: None.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
. Default: None.data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
. Default: None.
- extract_gt_data(data_samples)¶
extract gt data from data samples.
- 参数
data_samples (list) – List of DataSample.
- 返回
Extract gt data.
- 返回类型
Tensor
- g_step(batch_outputs, batch_gt_data)¶
G step of GAN: Calculate losses of generator.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tuple[Tensor]) – Batch GT data.
- 返回
Dict of losses.
- 返回类型
dict
- train_step(data: List[dict], optim_wrapper: mmengine.optim.OptimWrapperDict) Dict[str, torch.Tensor] ¶
Train step of GAN-based method.
- 参数
data (List[dict]) – Data sampled from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, torch.Tensor]
- forward_train(batch_inputs, data_samples=None)¶
Forward Train.
Run forward of generator with
return_lqs=True
- 参数
batch_inputs (Tensor) – Batch inputs.
data_samples (List[DataSample]) – Data samples of Editing. Default:None
- 返回
- Result of generator.
(outputs, lqs)
- 返回类型
Tuple[Tensor]
- class mmagic.models.editors.RealBasicVSRNet(mid_channels=64, num_propagation_blocks=20, num_cleaning_blocks=20, dynamic_refine_thres=255, spynet_pretrained=None, is_fix_cleaning=False, is_sequential_cleaning=False)¶
Bases:
mmengine.model.BaseModule
RealBasicVSR network structure for real-world video super-resolution.
Support only x4 upsampling.
- Paper:
Investigating Tradeoffs in Real-World Video Super-Resolution, arXiv
- 参数
mid_channels (int, optional) – Channel number of the intermediate features. Default: 64.
num_propagation_blocks (int, optional) – Number of residual blocks in each propagation branch. Default: 20.
num_cleaning_blocks (int, optional) – Number of residual blocks in the image cleaning module. Default: 20.
dynamic_refine_thres (int, optional) – Stop cleaning the images when the residue is smaller than this value. Default: 255.
spynet_pretrained (str, optional) – Pre-trained model path of SPyNet. Default: None.
is_fix_cleaning (bool, optional) – Whether to fix the weights of the image cleaning module during training. Default: False.
is_sequential_cleaning (bool, optional) – Whether to clean the images sequentially. This is used to save GPU memory, but the speed is slightly slower. Default: False.
- forward(lqs, return_lqs=False)¶
Forward function for BasicVSR++.
- 参数
lqs (tensor) – Input low quality (LQ) sequence with shape (n, t, c, h, w).
return_lqs (bool) – Whether to return LQ sequence. Default: False.
- 返回
Output HR sequence.
- 返回类型
Tensor
- class mmagic.models.editors.RealESRGAN(generator, discriminator=None, gan_loss=None, pixel_loss=None, perceptual_loss=None, is_use_sharpened_gt_in_pixel=False, is_use_sharpened_gt_in_percep=False, is_use_sharpened_gt_in_gan=False, is_use_ema=True, train_cfg=None, test_cfg=None, init_cfg=None, data_preprocessor=None)¶
Bases:
mmagic.models.editors.srgan.SRGAN
Real-ESRGAN model for single image super-resolution.
Ref: Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data, 2021.
Note: generator_ema is realized in EMA_HOOK
- 参数
generator (dict) – Config for the generator.
discriminator (dict, optional) – Config for the discriminator. Default: None.
gan_loss (dict, optional) – Config for the gan loss. Note that the loss weight in gan loss is only for the generator.
pixel_loss (dict, optional) – Config for the pixel loss. Default: None.
perceptual_loss (dict, optional) – Config for the perceptual loss. Default: None.
is_use_sharpened_gt_in_pixel (bool, optional) – Whether to use the image sharpened by unsharp masking as the GT for pixel loss. Default: False.
is_use_sharpened_gt_in_percep (bool, optional) – Whether to use the image sharpened by unsharp masking as the GT for perceptual loss. Default: False.
is_use_sharpened_gt_in_gan (bool, optional) – Whether to use the image sharpened by unsharp masking as the GT for adversarial loss. Default: False.
is_use_ema (bool, optional) – When to apply exponential moving average on the network weights. Default: True.
train_cfg (dict) – Config for training. Default: None. You may change the training of gan by setting: disc_steps: how many discriminator updates after one generate update; disc_init_steps: how many discriminator updates at the start of the training. These two keys are useful when training with WGAN.
test_cfg (dict) – Config for testing. Default: None.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
. Default: None.data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
. Default: None.
- forward_tensor(inputs, data_samples=None, training=False)¶
Forward tensor. Returns result of simple forward.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.training (bool) – Whether is training. Default: False.
- 返回
result of simple forward.
- 返回类型
Tensor
- g_step(batch_outputs, batch_gt_data)¶
G step of GAN: Calculate losses of generator.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tuple[Tensor]) – Batch GT data.
- 返回
Dict of losses.
- 返回类型
dict
- d_step_real(batch_outputs, batch_gt_data: torch.Tensor)¶
Real part of D step.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tuple[Tensor]) – Batch GT data.
- 返回
Real part of gan_loss for discriminator.
- 返回类型
Tensor
- d_step_fake(batch_outputs, batch_gt_data)¶
Fake part of D step.
- 参数
batch_outputs (Tensor) – Output of generator.
batch_gt_data (Tuple[Tensor]) – Batch GT data.
- 返回
Fake part of gan_loss for discriminator.
- 返回类型
Tensor
- extract_gt_data(data_samples)¶
extract gt data from data samples.
- 参数
data_samples (list) – List of DataSample.
- 返回
Extract gt data.
- 返回类型
Tensor
- class mmagic.models.editors.UNetDiscriminatorWithSpectralNorm(in_channels, mid_channels=64, skip_connection=True)¶
Bases:
mmengine.model.BaseModule
A U-Net discriminator with spectral normalization.
- 参数
in_channels (int) – Channel number of the input.
mid_channels (int, optional) – Channel number of the intermediate features. Default: 64.
skip_connection (bool, optional) – Whether to use skip connection. Default: True.
- forward(img)¶
Forward function.
- 参数
img (Tensor) – Input tensor with shape (n, c, h, w).
- 返回
Forward results.
- 返回类型
Tensor
- class mmagic.models.editors.Restormer(inp_channels=3, out_channels=3, dim=48, num_blocks=[4, 6, 6, 8], num_refinement_blocks=4, heads=[1, 2, 4, 8], ffn_expansion_factor=2.66, bias=False, LayerNorm_type='WithBias', dual_pixel_task=False, dual_keys=['imgL', 'imgR'])¶
Bases:
mmengine.model.BaseModule
Restormer A PyTorch impl of: Restormer: Efficient Transformer for High- Resolution Image Restoration. Ref repo: https://github.com/swz30/Restormer.
- 参数
inp_channels (int) – Number of input image channels. Default: 3.
out_channels (int) – Number of output image channels: 3.
dim (int) – Number of feature dimension. Default: 48.
num_blocks (List(int)) – Depth of each Transformer layer. Default: [4, 6, 6, 8].
num_refinement_blocks (int) – Number of refinement blocks. Default: 4.
heads (List(int)) – Number of attention heads in different layers. Default: 7.
ffn_expansion_factor (float) – Ratio of feed forward network expansion. Default: 2.66.
bias (bool) – The bias of convolution. Default: False
LayerNorm_type (str|optional) – Select layer Normalization type. Optional: ‘WithBias’,’BiasFree’ Default: ‘WithBias’.
dual_pixel_task (bool) – True for dual-pixel defocus deblurring only. Also set inp_channels=6. Default: False.
dual_keys (List) – Keys of dual images in inputs. Default: [‘imgL’, ‘imgR’].
- forward(inp_img)¶
Forward function.
- 参数
inp_img (Tensor) – Input tensor with shape (B, C, H, W).
- 返回
Forward results.
- 返回类型
Tensor
- class mmagic.models.editors.SAGAN(generator: ModelType, discriminator: Optional[ModelType] = None, data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, noise_size: Optional[int] = 128, num_classes: Optional[int] = None, ema_config: Optional[Dict] = None)¶
Bases:
mmagic.models.base_models.BaseConditionalGAN
Implementation of Self-Attention Generative Adversarial Networks.
<https://arxiv.org/abs/1805.08318>`_ (SAGAN), Spectral Normalization for Generative Adversarial Networks (SNGAN), and cGANs with Projection Discriminator (Proj-GAN).
Detailed architecture can be found in
SNGANGenerator
andProjDiscriminator
- 参数
generator (ModelType) – The config or model of the generator.
discriminator (Optional[ModelType]) – The config or model of the discriminator. Defaults to None.
data_preprocessor (Optional[Union[dict, Config]]) – The pre-process config or
DataPreprocessor
.generator_steps (int) – Number of times the generator was completely updated before the discriminator is updated. Defaults to 1.
discriminator_steps (int) – Number of times the discriminator was completely updated before the generator is updated. Defaults to 1.
noise_size (Optional[int]) – Size of the input noise vector. Default to 128.
num_classes (Optional[int]) – The number classes you would like to generate. Defaults to None.
ema_config (Optional[Dict]) – The config for generator’s exponential moving average setting. Defaults to None.
- disc_loss(disc_pred_fake: torch.Tensor, disc_pred_real: torch.Tensor) Tuple[torch.Tensor, dict] ¶
Get disc loss. SAGAN, SNGAN and Proj-GAN use hinge loss to train the discriminator.
- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
disc_pred_real (Tensor) – Discriminator’s prediction of the real images.
- 返回
Loss value and a dict of log variables.
- 返回类型
Tuple[Tensor, dict]
- gen_loss(disc_pred_fake: torch.Tensor) Tuple[torch.Tensor, dict] ¶
Get disc loss. SAGAN, SNGAN and Proj-GAN use hinge loss to train the generator.
- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
- 返回
Loss value and a dict of log variables.
- 返回类型
Tuple[Tensor, dict]
- train_discriminator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train discriminator.
- 参数
inputs (dict) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- train_generator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train generator.
- 参数
inputs (dict) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader. Do not used in generator’s training.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- class mmagic.models.editors.SinGAN(generator: ModelType, discriminator: Optional[ModelType] = None, data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, num_scales: Optional[int] = None, iters_per_scale: int = 2000, noise_weight_init: int = 0.1, lr_scheduler_args: Optional[dict] = None, test_pkl_data: Optional[str] = None, ema_confg: Optional[dict] = None)¶
Bases:
mmagic.models.base_models.BaseGAN
SinGAN.
This model implement the single image generative adversarial model proposed in: Singan: Learning a Generative Model from a Single Natural Image, ICCV’19.
Notes for training:
This model should be trained with our dataset
SinGANDataset
.In training, the
total_iters
arguments is related to the number of scales in the image pyramid anditers_per_scale
in thetrain_cfg
. You should set it carefully in the training config file.
Notes for model architectures:
The generator and discriminator need
num_scales
in initialization. However, this arguments is generated bycreate_real_pyramid
function from thesingan_dataset.py
. The last element in the returned list (stop_scale
) is the value fornum_scales
. Pay attention that this scale is counted from zero. Please see our tutorial for SinGAN to obtain more details or our standard config for reference.
- 参数
generator (ModelType) – The config or model of the generator.
discriminator (Optional[ModelType]) – The config or model of the discriminator. Defaults to None.
data_preprocessor (Optional[Union[dict, Config]]) – The pre-process config or
DataPreprocessor
.generator_steps (int) – The number of times the generator is completely updated before the discriminator is updated. Defaults to 1.
discriminator_steps (int) – The number of times the discriminator is completely updated before the generator is updated. Defaults to 1.
num_scales (int) – The number of scales/stages in generator/ discriminator. Note that this number is counted from zero, which is the same as the original paper. Defaults to None.
iters_per_scale (int) – The training iteration for each resolution scale. Defaults to 2000.
noise_weight_init (float) – The initialize weight of fixed noise. Defaults to 0.1
lr_scheduler_args (Optional[dict]) – Arguments for learning schedulers. Note that in SinGAN, we use MultiStepLR, which is the same as the original paper. If not passed, no learning schedule will be used. Defaults to None.
test_pkl_data (Optional[str]) – The path of pickle file which contains fixed noise and noise weight. This is must for test. Defaults to None.
ema_config (Optional[Dict]) – The config for generator’s exponential moving average setting. Defaults to None.
- load_test_pkl()¶
Load pickle for test.
- _from_numpy(data: Tuple[list, numpy.ndarray]) Tuple[torch.Tensor, List[torch.Tensor]] ¶
Convert input numpy array or list of numpy array to Tensor or list of Tensor.
- 参数
data (Tuple[list, np.ndarray]) – Input data to convert.
- 返回
Converted Tensor or list of tensor.
- 返回类型
Tuple[Tensor, List[Tensor]]
- get_module(model: torch.nn.Module, module_name: str) torch.nn.Module ¶
Get an inner module from model.
Since we will wrapper DDP for some model, we have to judge whether the module can be indexed directly.
- 参数
model (nn.Module) – This model may wrapped with DDP or not.
module_name (str) – The name of specific module.
- 返回
Returned sub module.
- 返回类型
nn.Module
- construct_fixed_noises()¶
Construct the fixed noises list used in SinGAN.
- forward(inputs: mmagic.utils.ForwardInputs, data_samples: Optional[list] = None, mode=None) List[mmagic.structures.DataSample] ¶
Forward function for SinGAN. For SinGAN, inputs should be a dict contains ‘num_batches’, ‘mode’ and other input arguments for the generator.
- 参数
inputs (dict) – Dict containing the necessary information (e.g., noise, num_batches, mode) to generate image.
data_samples (Optional[list]) – Data samples collated by
data_preprocessor
. Defaults to None.mode (Optional[str]) – mode is not used in
BaseConditionalGAN
. Defaults to None.
- gen_loss(disc_pred_fake: torch.Tensor, recon_imgs: torch.Tensor) Tuple[torch.Tensor, dict] ¶
Generator loss for SinGAN. SinGAN use WGAN’s loss and MSE loss to train the generator.
- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
recon_imgs (Tensor) – Reconstructive images.
- 返回
Loss value and a dict of log variables.
- 返回类型
Tuple[Tensor, dict]
- disc_loss(disc_pred_fake: torch.Tensor, disc_pred_real: torch.Tensor, fake_data: torch.Tensor, real_data: torch.Tensor) Tuple[torch.Tensor, dict] ¶
Get disc loss. SAGAN, SNGAN and Proj-GAN use hinge loss to train the generator.
- 参数
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
disc_pred_real (Tensor) – Discriminator’s prediction of the real images.
fake_data (Tensor) – Generated images, used to calculate gradient penalty.
real_data (Tensor) – Real images, used to calculate gradient penalty.
- 返回
Loss value and a dict of log variables.
- 返回类型
Tuple[Tensor, dict]
- train_generator(inputs: dict, data_samples: List[mmagic.structures.DataSample], optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train generator.
- 参数
inputs (dict) – Inputs from dataloader.
data_samples (List[DataSample]) – Data samples from dataloader. Do not used in generator’s training.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- train_discriminator(inputs: dict, data_samples: List[mmagic.structures.DataSample], optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] ¶
Train discriminator.
- 参数
inputs (dict) – Inputs from dataloader.
data_samples (List[DataSample]) – Data samples from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, Tensor]
- train_gan(inputs_dict: dict, data_sample: List[mmagic.structures.DataSample], optim_wrapper: mmengine.optim.OptimWrapperDict) Dict[str, torch.Tensor] ¶
Train GAN model. In the training of GAN models, generator and discriminator are updated alternatively. In MMagic’s design, self.train_step is called with data input. Therefore we always update discriminator, whose updating is relay on real data, and then determine if the generator needs to be updated based on the current number of iterations. More details about whether to update generator can be found in
should_gen_update()
.- 参数
data (dict) – Data sampled from dataloader.
data_sample (List[DataSample]) – List of data sample contains GT and meta information.
optim_wrapper (OptimWrapperDict) – OptimWrapperDict instance contains OptimWrapper of generator and discriminator.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, torch.Tensor]
- train_step(data: dict, optim_wrapper: mmengine.optim.OptimWrapperDict) Dict[str, torch.Tensor] ¶
Train step for SinGAN model. SinGAN is trained with multi-resolution images, and each resolution is trained for :attr:self.iters_per_scale times.
We initialize the weight and learning rate scheduler of the corresponding module at the start of each resolution’s training. At the end of each resolution’s training, we update the weight of the noise of current resolution by mse loss between reconstructed image and real image.
- 参数
data (dict) – Data sampled from dataloader.
optim_wrapper (OptimWrapperDict) – OptimWrapperDict instance contains OptimWrapper of generator and discriminator.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, torch.Tensor]
- test_step(data: dict) mmagic.utils.SampleList ¶
Gets the generated image of given data in test progress. Before generate images, we call :meth:self.load_test_pkl to load the fixed noise and current stage of the model from the pickle file.
- 参数
data (dict) – Data sampled from metric specific sampler. More details in Metrics and Evaluator.
- 返回
A list of
DataSample
contain generated results.- 返回类型
SampleList
- class mmagic.models.editors.SRCNNNet(channels=(3, 64, 32, 3), kernel_sizes=(9, 1, 5), upscale_factor=4)¶
Bases:
mmengine.model.BaseModule
SRCNN network structure for image super resolution.
SRCNN has three conv layers. For each layer, we can define the in_channels, out_channels and kernel_size. The input image will first be upsampled with a bicubic upsampler, and then super-resolved in the HR spatial size.
Paper: Learning a Deep Convolutional Network for Image Super-Resolution.
- 参数
channels (tuple[int]) – A tuple of channel numbers for each layer including channels of input and output . Default: (3, 64, 32, 3).
kernel_sizes (tuple[int]) – A tuple of kernel sizes for each conv layer. Default: (9, 1, 5).
upscale_factor (int) – Upsampling factor. Default: 4.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (n, c, h, w).
- 返回
Forward results.
- 返回类型
Tensor
- class mmagic.models.editors.SRGAN(generator, discriminator=None, gan_loss=None, pixel_loss=None, perceptual_loss=None, train_cfg=None, test_cfg=None, init_cfg=None, data_preprocessor=None)¶
Bases:
mmagic.models.base_models.BaseEditModel
SRGAN model for single image super-resolution.
Ref: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.
- 参数
generator (dict) – Config for the generator.
discriminator (dict) – Config for the discriminator. Default: None.
gan_loss (dict) – Config for the gan loss. Note that the loss weight in gan loss is only for the generator.
pixel_loss (dict) – Config for the pixel loss. Default: None.
perceptual_loss (dict) – Config for the perceptual loss. Default: None.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
. Default: None.data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
. Default: None.
- forward_train(inputs, data_samples=None, **kwargs)¶
Forward training. Losses of training is calculated in train_step.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.
- 返回
Result of
forward_tensor
withtraining=True
.- 返回类型
Tensor
- forward_tensor(inputs, data_samples=None, training=False)¶
Forward tensor. Returns result of simple forward.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement], optional) – data samples collated by
data_preprocessor
.training (bool) – Whether is training. Default: False.
- 返回
result of simple forward.
- 返回类型
Tensor
- if_run_g()¶
Calculates whether need to run the generator step.
- if_run_d()¶
Calculates whether need to run the discriminator step.
- g_step(batch_outputs: torch.Tensor, batch_gt_data: torch.Tensor)¶
G step of GAN: Calculate losses of generator.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
- 返回
Dict of losses.
- 返回类型
dict
- d_step_real(batch_outputs, batch_gt_data: torch.Tensor)¶
Real part of D step.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
- 返回
Real part of gan_loss for discriminator.
- 返回类型
Tensor
- d_step_fake(batch_outputs: torch.Tensor, batch_gt_data)¶
Fake part of D step.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
- 返回
Fake part of gan_loss for discriminator.
- 返回类型
Tensor
- g_step_with_optim(batch_outputs: torch.Tensor, batch_gt_data: torch.Tensor, optim_wrapper: mmengine.optim.OptimWrapperDict)¶
G step with optim of GAN: Calculate losses of generator and run optim.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
optim_wrapper (OptimWrapperDict) – Optim wrapper dict.
- 返回
Dict of parsed losses.
- 返回类型
dict
- d_step_with_optim(batch_outputs: torch.Tensor, batch_gt_data: torch.Tensor, optim_wrapper: mmengine.optim.OptimWrapperDict)¶
D step with optim of GAN: Calculate losses of discriminator and run optim.
- 参数
batch_outputs (Tensor) – Batch output of generator.
batch_gt_data (Tensor) – Batch GT data.
optim_wrapper (OptimWrapperDict) – Optim wrapper dict.
- 返回
Dict of parsed losses.
- 返回类型
dict
- extract_gt_data(data_samples)¶
extract gt data from data samples.
- 参数
data_samples (list) – List of DataSample.
- 返回
Extract gt data.
- 返回类型
Tensor
- train_step(data: List[dict], optim_wrapper: mmengine.optim.OptimWrapperDict) Dict[str, torch.Tensor] ¶
Train step of GAN-based method.
- 参数
data (List[dict]) – Data sampled from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, torch.Tensor]
- class mmagic.models.editors.ModifiedVGG(in_channels, mid_channels)¶
Bases:
mmengine.model.BaseModule
A modified VGG discriminator with input size 128 x 128.
It is used to train SRGAN and ESRGAN.
- 参数
in_channels (int) – Channel number of inputs. Default: 3.
mid_channels (int) – Channel number of base intermediate features. Default: 64.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (n, c, h, w).
- 返回
Forward results.
- 返回类型
Tensor
- class mmagic.models.editors.MSRResNet(in_channels, out_channels, mid_channels=64, num_blocks=16, upscale_factor=4)¶
Bases:
mmengine.model.BaseModule
Modified SRResNet.
A compacted version modified from SRResNet in “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”.
It uses residual blocks without BN, similar to EDSR. Currently, it supports x2, x3 and x4 upsampling scale factor.
- 参数
in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64.
num_blocks (int) – Block number in the trunk network. Default: 16.
upscale_factor (int) – Upsampling factor. Support x2, x3 and x4. Default: 4.
- _supported_upscale_factors = [2, 3, 4]¶
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (n, c, h, w).
- 返回
Forward results.
- 返回类型
Tensor
- init_weights()¶
Init weights for models.
- class mmagic.models.editors.StableDiffusion(vae: ModelType, text_encoder: ModelType, tokenizer: str, unet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, dtype: Optional[str] = None, enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor: Optional[ModelType] = dict(type='DataPreprocessor'), init_cfg: Optional[dict] = None)¶
Bases:
mmengine.model.BaseModel
Class for Stable Diffusion. Refers to https://github.com/Stability- AI/stablediffusion and https://github.com/huggingface/diffusers/blob/main/s rc/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_an d_excite.py # noqa.
- 参数
unet (Union[dict, nn.Module]) – The config or module for Unet model.
text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.
vae (Union[dict, nn.Module]) – The config or module for VAE model.
tokenizer (str) – The name for CLIP tokenizer.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
dtype (str, optional) – The dtype for the model This argument will not work when dtype is defined for submodels. Defaults to None.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise Defaults to 0.
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.init_cfg (dict, optional) – The weight initialized config for
BaseModule
.
- property device¶
- set_xformers(module: Optional[torch.nn.Module] = None) torch.nn.Module ¶
Set xformers for the model.
- 返回
The model with xformers.
- 返回类型
nn.Module
- set_tomesd() torch.nn.Module ¶
Set ToMe for the stable diffusion model.
- 返回
The model with ToMe.
- 返回类型
nn.Module
- train(mode: bool = True)¶
Set train/eval mode.
- 参数
mode (bool, optional) – Whether set train mode. Defaults to True.
- infer(prompt: Union[str, List[str]], height: Optional[int] = None, width: Optional[int] = None, num_inference_steps: int = 50, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[torch.Generator] = None, latents: Optional[torch.FloatTensor] = None, show_progress=True, seed=1, return_type='image')¶
Function invoked when calling the pipeline for generation.
- 参数
prompt (str or List[str]) – The prompt or prompts to guide the image generation.
(int (height) – defaults to self.unet_sample_size * self.vae_scale_factor): The height in pixels of the generated image.
optional – defaults to self.unet_sample_size * self.vae_scale_factor): The height in pixels of the generated image.
- :paramdefaults to self.unet_sample_size * self.vae_scale_factor):
The height in pixels of the generated image.
- 参数
(int (width) – defaults to self.unet_sample_size * self.vae_scale_factor): The width in pixels of the generated image.
optional – defaults to self.unet_sample_size * self.vae_scale_factor): The width in pixels of the generated image.
- :paramdefaults to self.unet_sample_size * self.vae_scale_factor):
The width in pixels of the generated image.
- 参数
num_inference_steps (int, optional, defaults to 50) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float, optional, defaults to 7.5) – Guidance scale as defined in [Classifier-Free Diffusion Guidance] (https://arxiv.org/abs/2207.12598).
negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
num_images_per_prompt (int, optional, defaults to 1) – The number of images to generate per prompt.
eta (float, optional, defaults to 0.0) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to [schedulers.DDIMScheduler], will be ignored for others.
generator (torch.Generator, optional) – A [torch generator] to make generation deterministic.
latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator.
return_type (str) – The return type of the inference results. Supported types are ‘image’, ‘numpy’, ‘tensor’. If ‘image’ is passed, a list of PIL images will be returned. If ‘numpy’ is passed, a numpy array with shape [N, C, H, W] will be returned, and the value range will be same as decoder’s output range. If ‘tensor’ is passed, the decoder’s output will be returned. Defaults to ‘image’.
- 返回
A dict containing the generated images.
- 返回类型
dict
- output_to_pil(image) List[PIL.Image.Image] ¶
Convert output tensor to PIL image. Output tensor will be de-normed to [0, 255] by DataPreprocessor.destruct. Due to no data_samples is passed, color order conversion will not be performed.
- 参数
image (torch.Tensor) – The output tensor of the decoder.
- 返回
The list of processed PIL images.
- 返回类型
List[Image.Image]
- _encode_prompt(prompt, device, num_images_per_prompt, do_classifier_free_guidance, negative_prompt)¶
Encodes the prompt into text encoder hidden states.
- 参数
prompt (str or list(int)) – prompt to be encoded.
device – (torch.device): torch device.
num_images_per_prompt (int) – number of images that should be generated per prompt.
do_classifier_free_guidance (bool) – whether to use classifier free guidance or not.
negative_prompt (str or List[str]) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
- 返回
- text embeddings generated by
clip text encoder.
- 返回类型
text_embeddings (torch.Tensor)
- decode_latents(latents)¶
use vae to decode latents.
- 参数
latents (torch.Tensor) – latents to decode.
- 返回
image result.
- 返回类型
image (torch.Tensor)
- prepare_extra_step_kwargs(generator, eta)¶
prepare extra kwargs for the scheduler step.
- 参数
generator (torch.Generator) – generator for random functions.
eta (float) – eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers. eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502 and should be between [0, 1]
- 返回
dict contains ‘generator’ and ‘eta’
- 返回类型
extra_step_kwargs (dict)
- prepare_test_scheduler_extra_step_kwargs(generator, eta)¶
prepare extra kwargs for the scheduler step.
- 参数
generator (torch.Generator) – generator for random functions.
eta (float) – eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers. eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502 and should be between [0, 1]
- 返回
dict contains ‘generator’ and ‘eta’
- 返回类型
extra_step_kwargs (dict)
- check_inputs(prompt, height, width)¶
check whether inputs are in suitable format or not.
- prepare_latents(batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None)¶
prepare latents for diffusion to run in latent space.
- 参数
batch_size (int) – batch size.
num_channels_latents (int) – latent channel nums.
height (int) – image height.
width (int) – image width.
dtype (torch.dtype) – float type.
device (torch.device) – torch device.
generator (torch.Generator) – generator for random functions, defaults to None.
latents (torch.Tensor) – Pre-generated noisy latents, defaults to None.
- 返回
prepared latents.
- 返回类型
latents (torch.Tensor)
- val_step(data: dict) mmagic.utils.typing.SampleList ¶
Gets the predictions of given data.
Calls
self.data_preprocessor(data, False)
andself(inputs, data_sample, mode='predict')
in order. Return the predictions which will be passed to evaluator.- 参数
data (dict or tuple or list) – Data sampled from dataset.
- 返回
The predictions of given data.
- 返回类型
list
- test_step(data: dict) mmagic.utils.typing.SampleList ¶
BaseModel
implementstest_step
the same asval_step
.- 参数
data (dict or tuple or list) – Data sampled from dataset.
- 返回
The predictions of given data.
- 返回类型
list
- train_step(data, optim_wrapper_dict)¶
Implements the default model training process including preprocessing, model forward propagation, loss calculation, optimization, and back-propagation.
During non-distributed training. If subclasses do not override the
train_step()
,EpochBasedTrainLoop
orIterBasedTrainLoop
will call this method to update model parameters. The default parameter update process is as follows:Calls
self.data_processor(data, training=False)
to collect batch_inputs and corresponding data_samples(labels).Calls
self(batch_inputs, data_samples, mode='loss')
to get raw lossCalls
self.parse_losses
to getparsed_losses
tensor used to backward and dict of loss tensor used to log messages.Calls
optim_wrapper.update_params(loss)
to update model.
- 参数
data (dict or tuple or list) – Data sampled from dataset.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- 返回
A
dict
of tensor for logging.- 返回类型
Dict[str, torch.Tensor]
- abstract forward(inputs: torch.Tensor, data_samples: Optional[list] = None, mode: str = 'tensor') Union[Dict[str, torch.Tensor], list] ¶
forward is not implemented now.
- class mmagic.models.editors.StableDiffusionInpaint(*args, **kwargs)¶
Bases:
mmagic.models.editors.stable_diffusion.stable_diffusion.StableDiffusion
Class for Stable Diffusion. Refers to https://github.com/Stability- AI/stablediffusion and https://github.com/huggingface/diffusers/blob/main/s rc/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_attend_an d_excite.py # noqa.
- 参数
unet (Union[dict, nn.Module]) – The config or module for Unet model.
text_encoder (Union[dict, nn.Module]) – The config or module for text encoder.
vae (Union[dict, nn.Module]) – The config or module for VAE model.
tokenizer (str) – The name for CLIP tokenizer.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
dtype (str, optional) – The dtype for the model This argument will not work when dtype is defined for submodels. Defaults to None.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise Defaults to 0.
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.init_cfg (dict, optional) – The weight initialized config for
BaseModule
.
- infer(prompt: Union[str, List[str]], image: Union[torch.FloatTensor, PIL.Image.Image] = None, mask_image: Union[torch.FloatTensor, PIL.Image.Image] = None, height: Optional[int] = None, width: Optional[int] = None, num_inference_steps: int = 50, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[torch.Generator] = None, latents: Optional[torch.FloatTensor] = None, show_progress=True, seed=1, return_type='image')¶
Function invoked when calling the pipeline for generation.
- 参数
prompt (str or List[str]) – The prompt or prompts to guide the image generation.
image (Union[torch.FloatTensor, Image.Image]) – The image to inpaint.
mask_image (Union[torch.FloatTensor, Image.Image]) – The mask to apply to the image, i.e. regions to inpaint.
(int (height) – defaults to self.unet_sample_size * self.vae_scale_factor): The height in pixels of the generated image.
optional – defaults to self.unet_sample_size * self.vae_scale_factor): The height in pixels of the generated image.
- :paramdefaults to self.unet_sample_size * self.vae_scale_factor):
The height in pixels of the generated image.
- 参数
(int (width) – defaults to self.unet_sample_size * self.vae_scale_factor): The width in pixels of the generated image.
optional – defaults to self.unet_sample_size * self.vae_scale_factor): The width in pixels of the generated image.
- :paramdefaults to self.unet_sample_size * self.vae_scale_factor):
The width in pixels of the generated image.
- 参数
num_inference_steps (int, optional, defaults to 50) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
guidance_scale (float, optional, defaults to 7.5) – Guidance scale as defined in [Classifier-Free Diffusion Guidance] (https://arxiv.org/abs/2207.12598).
negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
num_images_per_prompt (int, optional, defaults to 1) – The number of images to generate per prompt.
eta (float, optional, defaults to 0.0) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to [schedulers.DDIMScheduler], will be ignored for others.
generator (torch.Generator, optional) – A [torch generator] to make generation deterministic.
latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator.
return_type (str) – The return type of the inference results. Supported types are ‘image’, ‘numpy’, ‘tensor’. If ‘image’ is passed, a list of PIL images will be returned. If ‘numpy’ is passed, a numpy array with shape [N, C, H, W] will be returned, and the value range will be same as decoder’s output range. If ‘tensor’ is passed, the decoder’s output will be returned. Defaults to ‘image’.
- 返回
A dict containing the generated images.
- 返回类型
dict
- prepare_mask_latents(mask, masked_image, batch_size, num_channels_latents, height, width, dtype, device, generator, do_classifier_free_guidance)¶
prepare latents for diffusion to run in latent space.
- 参数
mask (torch.Tensor) – The mask to apply to the image, i.e. regions to inpaint.
image (torch.Tensor) – The image to be masked.
batch_size (int) – batch size.
num_channels_latents (int) – latent channel nums.
height (int) – image height.
width (int) – image width.
dtype (torch.dtype) – float type.
device (torch.device) – torch device.
generator (torch.Generator) – generator for random functions, defaults to None.
latents (torch.Tensor) – Pre-generated noisy latents, defaults to None.
do_classifier_free_guidance (bool) – Whether to apply classifier-free guidance.
- 返回
prepared latents.
- 返回类型
latents (torch.Tensor)
- abstract val_step(data: dict) mmagic.utils.typing.SampleList ¶
Performs a validation step on the provided data.
This method is decorated with torch.no_grad() which indicates no gradients will be computed during the operations. This ensures efficient memory usage during testing.
- 参数
data (dict) – Dictionary containing input data for testing.
- 返回
List of samples processed during the testing step.
- 返回类型
SampleList
- 引发
NotImplementedError – This method has not been implemented.
- abstract test_step(data: dict) mmagic.utils.typing.SampleList ¶
Performs a testing step on the provided data.
This method is decorated with torch.no_grad() which indicates no gradients will be computed during the operations. This ensures efficient memory usage during testing.
- 参数
data (dict) – Dictionary containing input data for testing.
- 返回
List of samples processed during the testing step.
- 返回类型
SampleList
- 引发
NotImplementedError – This method has not been implemented.
- abstract train_step(data, optim_wrapper_dict)¶
Performs a training step on the provided data.
- 参数
data – Input data for training.
optim_wrapper_dict – Dictionary containing optimizer wrappers which may contain optimizers, schedulers, etc. required for the training step.
- 引发
NotImplementedError – This method has not been implemented.
- class mmagic.models.editors.StableDiffusionXL(vae: ModelType, text_encoder_one: ModelType, tokenizer_one: str, text_encoder_two: ModelType, tokenizer_two: str, unet: ModelType, scheduler: ModelType, test_scheduler: Optional[ModelType] = None, dtype: Optional[str] = None, enable_xformers: bool = True, noise_offset_weight: float = 0, tomesd_cfg: Optional[dict] = None, data_preprocessor: Optional[ModelType] = dict(type='DataPreprocessor'), lora_config: Optional[dict] = None, val_prompts: Union[str, List[str]] = None, finetune_text_encoder: bool = False, force_zeros_for_empty_prompt: bool = True, init_cfg: Optional[dict] = None)¶
Bases:
mmengine.model.BaseModel
Class for Stable Diffusion XL. Refers to https://github.com/Stability- AI.
/generative-models and https://github.com/huggingface/diffusers/blob/main/ src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py
- 参数
unet (Union[dict, nn.Module]) – The config or module for Unet model.
text_encoder_one (Union[dict, nn.Module]) – The config or module for text encoder.
tokenizer_one (str) – The name for CLIP tokenizer.
text_encoder_two (Union[dict, nn.Module]) – The config or module for text encoder.
tokenizer_two (str) – The name for CLIP tokenizer.
vae (Union[dict, nn.Module]) – The config or module for VAE model.
schedule (Union[dict, nn.Module]) – The config or module for diffusion scheduler.
test_scheduler (Union[dict, nn.Module], optional) – The config or module for diffusion scheduler in test stage (self.infer). If not passed, will use the same scheduler as schedule. Defaults to None.
dtype (str, optional) – The dtype for the model This argument will not work when dtype is defined for submodels. Defaults to None.
enable_xformers (bool, optional) – Whether to use xformers. Defaults to True.
noise_offset_weight (bool, optional) – The weight of noise offset introduced in https://www.crosslabs.org/blog/diffusion-with-offset-noise Defaults to 0.
tomesd_cfg (dict, optional) – The config for TOMESD. Please refers to https://github.com/dbolya/tomesd and https://github.com/open-mmlab/mmagic/blob/main/mmagic/models/utils/tome_utils.py for detail. # noqa Defaults to None.
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.lora_config (dict, optional) – The config for LoRA finetuning. Defaults to None.
val_prompts (Union[str, List[str]], optional) – The prompts for validation. Defaults to None.
finetune_text_encoder (bool, optional) – Whether to fine-tune text encoder. Defaults to False.
force_zeros_for_empty_prompt (bool) – Whether the negative prompt embeddings shall be forced to always be set to 0. Defaults to True.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
.
- property device¶
- prepare_model()¶
Prepare model for training.
Move model to target dtype and disable gradient for some models.
- set_lora()¶
Set LORA for model.
- set_xformers(module: Optional[torch.nn.Module] = None) torch.nn.Module ¶
Set xformers for the model.
- 返回
The model with xformers.
- 返回类型
nn.Module
- set_tomesd() torch.nn.Module ¶
Set ToMe for the stable diffusion model.
- 返回
The model with ToMe.
- 返回类型
nn.Module
- train(mode: bool = True)¶
Set train/eval mode.
- 参数
mode (bool, optional) – Whether set train mode. Defaults to True.
- infer(prompt: Union[str, List[str]], prompt_2: Optional[Union[str, List[str]]] = None, height: Optional[int] = None, width: Optional[int] = None, num_inference_steps: int = 50, denoising_end: Optional[float] = None, guidance_scale: float = 7.5, negative_prompt: Optional[Union[str, List[str]]] = None, negative_prompt_2: Optional[Union[str, List[str]]] = None, num_images_per_prompt: Optional[int] = 1, eta: float = 0.0, generator: Optional[torch.Generator] = None, latents: Optional[torch.FloatTensor] = None, show_progress: bool = True, seed: int = 1, original_size: Optional[Tuple[int, int]] = None, crops_coords_top_left: Tuple[int, int] = (0, 0), target_size: Optional[Tuple[int, int]] = None, negative_original_size: Optional[Tuple[int, int]] = None, negative_crops_coords_top_left: Tuple[int, int] = (0, 0), negative_target_size: Optional[Tuple[int, int]] = None, return_type='image')¶
Function invoked when calling the pipeline for generation.
- 参数
prompt (str or List[str]) – The prompt or prompts to guide the image generation.
prompt2 (str or List[str], optional) – The prompt or prompts to be sent to the tokenizer_two and text_encoder_two. If not defined, prompt is used in both text-encoders. Defaults to None.
(int (height) – defaults to self.unet_sample_size * self.vae_scale_factor): The height in pixels of the generated image.
optional – defaults to self.unet_sample_size * self.vae_scale_factor): The height in pixels of the generated image.
- :paramdefaults to self.unet_sample_size * self.vae_scale_factor):
The height in pixels of the generated image.
- 参数
(int (width) – defaults to self.unet_sample_size * self.vae_scale_factor): The width in pixels of the generated image.
optional – defaults to self.unet_sample_size * self.vae_scale_factor): The width in pixels of the generated image.
- :paramdefaults to self.unet_sample_size * self.vae_scale_factor):
The width in pixels of the generated image.
- 参数
num_inference_steps (int, optional, defaults to 50) – The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.
denoising_end (float, optional) – When specified, determines the fraction (between 0.0 and 1.0) of the total denoising process to be completed before it is intentionally prematurely terminated. As a result, the returned sample will still retain a substantial amount of noise as determined by the discrete timesteps selected by the scheduler. The denoising_end parameter should ideally be utilized when this pipeline forms a part of a “Mixture of Denoisers” multi-pipeline setup, as elaborated in [Refining the Image Output]( https://huggingface.co/docs/diffusers/api/pipelines/ stable_diffusion/stable_diffusion_xl#refining-the-image-output)
guidance_scale (float, optional, defaults to 7.5) – Guidance scale as defined in [Classifier-Free Diffusion Guidance] (https://arxiv.org/abs/2207.12598).
negative_prompt (str or List[str], optional) – The prompt or prompts not to guide the image generation. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1).
negative_prompt_2 (str or List[str], optional)) – The negative_prompt to be sent to the tokenizer_two and text_encoder_two. If not defined, negative_prompt is used in both text-encoders. Defaults to None.
num_images_per_prompt (int, optional, defaults to 1) – The number of images to generate per prompt.
eta (float, optional, defaults to 0.0) – Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to [schedulers.DDIMScheduler], will be ignored for others.
generator (torch.Generator, optional) – A [torch generator] to make generation deterministic.
latents (torch.FloatTensor, optional) – Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator.
show_progress (bool) – Whether to show progress. Defaults to False.
seed (int) – Seed to be used. Defaults to 1.
original_size (Tuple[int], optional) – If original_size is not the same as target_size the image will appear to be down- or upsampled. If original_size is (width, height) if not specified. Defaults to None.
crops_coords_top_left (Tuple[int], optional) – crops_coords_top_left can be used to generate an image that appears to be “cropped” from the position. Favorable, well-centered images are usually achieved by setting crops_coords_top_left to (0, 0). Defaults to (0, 0).
target_size (Tuple[int], optional) – For most cases, tar