Shortcuts

mmagic.models.editors.stylegan2

Package Contents

Classes

StyleGAN2

Implementation of `Analyzing and Improving the Image Quality of

ADAAug

Data Augmentation Module for Adaptive Discriminator augmentation.

ADAStyleGAN2Discriminator

StyleGAN2 Discriminator.

StyleGAN2Discriminator

StyleGAN2 Discriminator.

StyleGAN2Generator

StyleGAN2 Generator.

ConvDownLayer

Convolution and Downsampling layer.

ModMBStddevLayer

Modified MiniBatch Stddev Layer.

ModulatedToRGB

To RGB layer.

ResBlock

Residual block used in the discriminator of StyleGAN2.

class mmagic.models.editors.stylegan2.StyleGAN2(generator: ModelType, discriminator: Optional[ModelType] = None, data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, ema_config: Optional[Dict] = None, loss_config=dict())[source]

Bases: mmagic.models.base_models.BaseGAN

Implementation of Analyzing and Improving the Image Quality of Stylegan. # noqa.

Paper link: https://openaccess.thecvf.com/content_CVPR_2020/html/Karras_Analyzing_and_Improving_the_Image_Quality_of_StyleGAN_CVPR_2020_paper.html. # noqa

StyleGAN2Generator and StyleGAN2Discriminator

Parameters
  • generator (ModelType) – The config or model of the generator.

  • discriminator (Optional[ModelType]) – The config or model of the discriminator. Defaults to None.

  • data_preprocessor (Optional[Union[dict, Config]]) – The pre-process config or DataPreprocessor.

  • generator_steps (int) – The number of times the generator is completely updated before the discriminator is updated. Defaults to 1.

  • discriminator_steps (int) – The number of times the discriminator is completely updated before the generator is updated. Defaults to 1.

  • ema_config (Optional[Dict]) – The config for generator’s exponential moving average setting. Defaults to None.

disc_loss(disc_pred_fake: torch.Tensor, disc_pred_real: torch.Tensor, real_imgs: torch.Tensor) Tuple[source]
Get disc loss. StyleGANv2 use the non-saturating loss and R1

gradient penalty to train the discriminator.

Parameters
  • disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.

  • disc_pred_real (Tensor) – Discriminator’s prediction of the real images.

  • real_imgs (Tensor) – Input real images.

Returns

Loss value and a dict of log variables.

Return type

tuple[Tensor, dict]

gen_loss(disc_pred_fake: torch.Tensor, batch_size: int) Tuple[source]

Get gen loss. StyleGANv2 use the non-saturating loss and generator path regularization to train the generator.

Parameters
  • disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.

  • batch_size (int) – Batch size for generating fake images.

Returns

Loss value and a dict of log variables.

Return type

tuple[Tensor, dict]

train_discriminator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor][source]

Train discriminator.

Parameters
  • inputs (dict) – Inputs from dataloader.

  • data_samples (DataSample) – Data samples from dataloader.

  • optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.

Returns

A dict of tensor for logging.

Return type

Dict[str, Tensor]

train_generator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor][source]

Train generator.

Parameters
  • inputs (dict) – Inputs from dataloader.

  • data_samples (DataSample) – Data samples from dataloader. Do not used in generator’s training.

  • optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.

Returns

A dict of tensor for logging.

Return type

Dict[str, Tensor]

train_step(data: dict, optim_wrapper: mmengine.optim.OptimWrapperDict) Dict[str, torch.Tensor][source]

Train GAN model. In the training of GAN models, generator and discriminator are updated alternatively. In MMagic’s design, self.train_step is called with data input. Therefore we always update discriminator, whose updating is relay on real data, and then determine if the generator needs to be updated based on the current number of iterations. More details about whether to update generator can be found in should_gen_update().

Parameters
  • data (dict) – Data sampled from dataloader.

  • optim_wrapper (OptimWrapperDict) – OptimWrapperDict instance contains OptimWrapper of generator and discriminator.

Returns

A dict of tensor for logging.

Return type

Dict[str, torch.Tensor]

class mmagic.models.editors.stylegan2.ADAAug(aug_pipeline=None, update_interval=4, augment_initial_p=0.0, ada_target=0.6, ada_kimg=500)[source]

Bases: mmengine.model.BaseModule

Data Augmentation Module for Adaptive Discriminator augmentation.

Parameters
  • aug_pipeline (dict, optional) – Config for augmentation pipeline. Defaults to None.

  • update_interval (int, optional) – Interval for updating augmentation probability. Defaults to 4.

  • augment_initial_p (float, optional) – Initial augmentation probability. Defaults to 0..

  • ada_target (float, optional) – ADA target. Defaults to 0.6.

  • ada_kimg (int, optional) – ADA training duration. Defaults to 500.

update(iteration=0, num_batches=0)[source]

Update Augment probability.

Parameters
  • iteration (int, optional) – Training iteration. Defaults to 0.

  • num_batches (int, optional) – The number of reals batches. Defaults to 0.

class mmagic.models.editors.stylegan2.ADAStyleGAN2Discriminator(in_size, *args, data_aug=None, **kwargs)[source]

Bases: StyleGAN2Discriminator

StyleGAN2 Discriminator.

The architecture of this discriminator is proposed in StyleGAN2. More details can be found in: Analyzing and Improving the Image Quality of StyleGAN CVPR2020.

You can load pretrained model through passing information into pretrained argument. We have already offered official weights as follows:

If you want to load the ema model, you can just use following codes:

# ckpt_http is one of the valid path from http source
discriminator = StyleGAN2Discriminator(1024, 512,
                                       pretrained=dict(
                                           ckpt_path=ckpt_http,
                                           prefix='discriminator'))

Of course, you can also download the checkpoint in advance and set ckpt_path with local path.

Note that our implementation adopts BGR image as input, while the original StyleGAN2 provides RGB images to the discriminator. Thus, we provide bgr2rgb argument to convert the image space. If your images follow the RGB order, please set it to True accordingly.

Parameters
  • in_size (int) – The input size of images.

  • img_channels (int) – The number of channels of the input image. Defaults to 3.

  • channel_multiplier (int, optional) – The multiplier factor for the channel number. Defaults to 2.

  • blur_kernel (list, optional) – The blurry kernel. Defaults to [1, 3, 3, 1].

  • mbstd_cfg (dict, optional) – Configs for minibatch-stddev layer. Defaults to dict(group_size=4, channel_groups=1).

  • cond_size (int, optional) – The size of conditional input. If None or less than 1, no conditional mapping will be applied. Defaults to None.

  • cond_mapping_channels (int, optional) – The dimension of the output of conditional mapping. Only work when c_dim is larger than 0. If c_dim is larger than 0 and cmap_dim is None, will. Defaults to None.

  • cond_mapping_layers (int, optional) – The number of mapping layer used to map conditional input. Only work when c_dim is larger than 0. If cmapping_layer is None and c_dim is larger than 0, cmapping_layer will set as 8. Defaults to None.

  • num_fp16_scales (int, optional) – The number of resolutions to use auto fp16 training. Defaults to 0.

  • fp16_enabled (bool, optional) – Whether to use fp16 training in this module. Defaults to False.

  • out_fp32 (bool, optional) – Whether to convert the output feature map to torch.float32. Defaults to True.

  • convert_input_fp32 (bool, optional) – Whether to convert input type to fp32 if not fp16_enabled. This argument is designed to deal with the cases where some modules are run in FP16 and others in FP32. Defaults to True.

  • input_bgr2rgb (bool, optional) – Whether to reformat the input channels with order rgb. Since we provide several converted weights, whose input order is rgb. You can set this argument to True if you want to finetune on converted weights. Defaults to False.

  • pretrained (dict | None, optional) – Information for pretrained models. The necessary key is ‘ckpt_path’. Besides, you can also provide ‘prefix’ to load the generator part from the whole state dict. Defaults to None.

forward(x)[source]

Forward function.

class mmagic.models.editors.stylegan2.StyleGAN2Discriminator(in_size, img_channels=3, channel_multiplier=2, blur_kernel=[1, 3, 3, 1], mbstd_cfg=dict(group_size=4, channel_groups=1), cond_size=None, cond_mapping_channels=None, cond_mapping_layers=None, num_fp16_scales=0, fp16_enabled=False, out_fp32=True, convert_input_fp32=True, input_bgr2rgb=False, init_cfg=None, pretrained=None)[source]

Bases: mmengine.model.BaseModule

StyleGAN2 Discriminator.

The architecture of this discriminator is proposed in StyleGAN2. More details can be found in: Analyzing and Improving the Image Quality of StyleGAN CVPR2020.

You can load pretrained model through passing information into pretrained argument. We have already offered official weights as follows:

If you want to load the ema model, you can just use following codes:

# ckpt_http is one of the valid path from http source
discriminator = StyleGAN2Discriminator(1024, 512,
                                       pretrained=dict(
                                           ckpt_path=ckpt_http,
                                           prefix='discriminator'))

Of course, you can also download the checkpoint in advance and set ckpt_path with local path.

Note that our implementation adopts BGR image as input, while the original StyleGAN2 provides RGB images to the discriminator. Thus, we provide bgr2rgb argument to convert the image space. If your images follow the RGB order, please set it to True accordingly.

Parameters
  • in_size (int) – The input size of images.

  • img_channels (int) – The number of channels of the input image. Defaults to 3.

  • channel_multiplier (int, optional) – The multiplier factor for the channel number. Defaults to 2.

  • blur_kernel (list, optional) – The blurry kernel. Defaults to [1, 3, 3, 1].

  • mbstd_cfg (dict, optional) – Configs for minibatch-stddev layer. Defaults to dict(group_size=4, channel_groups=1).

  • cond_size (int, optional) – The size of conditional input. If None or less than 1, no conditional mapping will be applied. Defaults to None.

  • cond_mapping_channels (int, optional) – The dimension of the output of conditional mapping. Only work when c_dim is larger than 0. If c_dim is larger than 0 and cmap_dim is None, will. Defaults to None.

  • cond_mapping_layers (int, optional) – The number of mapping layer used to map conditional input. Only work when c_dim is larger than 0. If cmapping_layer is None and c_dim is larger than 0, cmapping_layer will set as 8. Defaults to None.

  • num_fp16_scales (int, optional) – The number of resolutions to use auto fp16 training. Defaults to 0.

  • fp16_enabled (bool, optional) – Whether to use fp16 training in this module. Defaults to False.

  • out_fp32 (bool, optional) – Whether to convert the output feature map to torch.float32. Defaults to True.

  • convert_input_fp32 (bool, optional) – Whether to convert input type to fp32 if not fp16_enabled. This argument is designed to deal with the cases where some modules are run in FP16 and others in FP32. Defaults to True.

  • input_bgr2rgb (bool, optional) – Whether to reformat the input channels with order rgb. Since we provide several converted weights, whose input order is rgb. You can set this argument to True if you want to finetune on converted weights. Defaults to False.

  • pretrained (dict | None, optional) – Information for pretrained models. The necessary key is ‘ckpt_path’. Besides, you can also provide ‘prefix’ to load the generator part from the whole state dict. Defaults to None.

_load_pretrained_model(ckpt_path, prefix='', map_location='cpu', strict=True)[source]
forward(x: torch.Tensor, label: Optional[torch.Tensor] = None)[source]

Forward function.

Parameters
  • x (torch.Tensor) – Input image tensor.

  • label (torch.Tensor, optional) – The conditional input feed to mapping layer. Defaults to None.

Returns

Predict score for the input image.

Return type

torch.Tensor

class mmagic.models.editors.stylegan2.StyleGAN2Generator(out_size, style_channels, out_channels=3, noise_size=None, cond_size=None, cond_mapping_channels=None, num_mlps=8, channel_multiplier=2, blur_kernel=[1, 3, 3, 1], lr_mlp=0.01, default_style_mode='mix', eval_style_mode='single', norm_eps=1e-06, mix_prob=0.9, update_mean_latent_with_ema=False, w_avg_beta=0.998, num_fp16_scales=0, fp16_enabled=False, bgr2rgb=False, pretrained=None, fixed_noise=False)[source]

Bases: mmengine.model.BaseModule

StyleGAN2 Generator.

In StyleGAN2, we use a static architecture composing of a style mapping module and number of convolutional style blocks. More details can be found in: Analyzing and Improving the Image Quality of StyleGAN CVPR2020.

You can load pretrained model through passing information into pretrained argument. We have already offered official weights as follows:

If you want to load the ema model, you can just use following codes:

# ckpt_http is one of the valid path from http source
generator = StyleGANv2Generator(1024, 512,
                                pretrained=dict(
                                    ckpt_path=ckpt_http,
                                    prefix='generator_ema'))

Of course, you can also download the checkpoint in advance and set ckpt_path with local path. If you just want to load the original generator (not the ema model), please set the prefix with ‘generator’.

Note that our implementation allows to generate BGR image, while the original StyleGAN2 outputs RGB images by default. Thus, we provide bgr2rgb argument to convert the image space.

Parameters
  • out_size (int) – The output size of the StyleGAN2 generator.

  • style_channels (int) – The number of channels for style code.

  • out_channels (int) – The number of channels for output. Defaults to 3.

  • noise_size (int, optional) – The size of (number of channels) the input noise. If not passed, will be set the same value as style_channels. Defaults to None.

  • cond_size (int, optional) – The size of the conditional input. If not passed or less than 1, no conditional embedding will be used. Defaults to None.

  • cond_mapping_channels (int, optional) – The channels of the conditional mapping layers. If not passed, will use the same value as style_channels. Defaults to None.

  • num_mlps (int, optional) – The number of MLP layers. Defaults to 8.

  • channel_multiplier (int, optional) – The multiplier factor for the channel number. Defaults to 2.

  • blur_kernel (list, optional) – The blurry kernel. Defaults to [1, 3, 3, 1].

  • lr_mlp (float, optional) – The learning rate for the style mapping layer. Defaults to 0.01.

  • default_style_mode (str, optional) – The default mode of style mixing. In training, we adopt mixing style mode in default. However, in the evaluation, we use ‘single’ style mode. [‘mix’, ‘single’] are currently supported. Defaults to ‘mix’.

  • eval_style_mode (str, optional) – The evaluation mode of style mixing. Defaults to ‘single’.

  • mix_prob (float, optional) – Mixing probability. The value should be in range of [0, 1]. Defaults to 0.9.

  • update_mean_latent_with_ema (bool, optional) – Whether update mean latent code (w) with EMA. Defaults to False.

  • w_avg_beta (float, optional) – The value used for update w_avg. Defaults to 0.998.

  • num_fp16_scales (int, optional) – The number of resolutions to use auto fp16 training. Different from fp16_enabled, this argument allows users to adopt FP16 training only in several blocks. This behaviour is much more similar to the official implementation by Tero. Defaults to 0.

  • fp16_enabled (bool, optional) – Whether to use fp16 training in this module. If this flag is True, the whole module will be wrapped with auto_fp16. Defaults to False.

  • pretrained (dict | None, optional) – Information for pretrained models. The necessary key is ‘ckpt_path’. Besides, you can also provide ‘prefix’ to load the generator part from the whole state dict. Defaults to None.

_load_pretrained_model(ckpt_path, prefix='', map_location='cpu', strict=True)[source]
train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

make_injected_noise()[source]

make noises that will be injected into feature maps.

Returns

List of layer-wise noise tensor.

Return type

list[Tensor]

get_mean_latent(num_samples=4096, **kwargs)[source]

Get mean latent of W space in this generator.

Parameters

num_samples (int, optional) – Number of sample times. Defaults to 4096.

Returns

Mean latent of this generator.

Return type

Tensor

style_mixing(n_source, n_target, inject_index=1, truncation_latent=None, truncation=0.7)[source]
forward(styles, label=None, num_batches=- 1, return_noise=False, return_latents=False, inject_index=None, truncation=1, truncation_latent=None, input_is_latent=False, injected_noise=None, add_noise=True, randomize_noise=True, update_ws=False, return_features=False, feat_idx=5, return_latent_only=False)[source]

Forward function.

This function has been integrated with the truncation trick. Please refer to the usage of truncation and truncation_latent.

Parameters
  • styles (torch.Tensor | list[torch.Tensor] | callable | None) – In StyleGAN2, you can provide noise tensor or latent tensor. Given a list containing more than one noise or latent tensors, style mixing trick will be used in training. Of course, You can directly give a batch of noise through a torch.Tensor or offer a callable function to sample a batch of noise data. Otherwise, the None indicates to use the default noise sampler.

  • label (torch.Tensor, optional) – Conditional inputs for the generator. Defaults to None.

  • num_batches (int, optional) – The number of batch size. Defaults to 0.

  • return_noise (bool, optional) – If True, noise_batch will be returned in a dict with fake_img. Defaults to False.

  • return_latents (bool, optional) – If True, latent will be returned in a dict with fake_img. Defaults to False.

  • inject_index (int | None, optional) – The index number for mixing style codes. Defaults to None.

  • truncation (float, optional) – Truncation factor. Give value less than 1., the truncation trick will be adopted. Defaults to 1.

  • truncation_latent (torch.Tensor, optional) – Mean truncation latent. Defaults to None.

  • input_is_latent (bool, optional) – If True, the input tensor is the latent tensor. Defaults to False.

  • injected_noise (torch.Tensor | None, optional) – Given a tensor, the random noise will be fixed as this input injected noise. Defaults to None.

  • add_noise (bool) – Whether apply noise injection. Defaults to True.

  • randomize_noise (bool, optional) – If False, images are sampled with the buffered noise tensor injected to the style conv block. Defaults to True.

  • update_ws (bool) – Whether update latent code with EMA. Only work when w_avg is registered. Defaults to False.

Returns

Generated image tensor or dictionary containing more data.

Return type

torch.Tensor | dict

class mmagic.models.editors.stylegan2.ConvDownLayer(in_channels, out_channels, kernel_size, downsample=False, blur_kernel=[1, 3, 3, 1], bias=True, act_cfg=dict(type='fused_bias'), fp16_enabled=False, conv_clamp=256.0)[source]

Bases: torch.nn.Sequential

Convolution and Downsampling layer.

Parameters
  • in_channels (int) – Input channels.

  • out_channels (int) – Output channels.

  • kernel_size (int) – Kernel size, same as nn.Con2d.

  • downsample (bool, optional) – Whether to adopt downsampling in features. Defaults to False.

  • blur_kernel (list[int], optional) – Blurry kernel. Defaults to [1, 3, 3, 1].

  • bias (bool, optional) – Whether to use bias parameter. Defaults to True.

  • act_cfg (dict, optional) – Activation configs. Defaults to dict(type=’fused_bias’).

  • fp16_enabled (bool, optional) – Whether to use fp16 training in this module. Defaults to False.

  • conv_clamp (float, optional) – Clamp the convolutional layer results to avoid gradient overflow. Defaults to 256.0.

forward(x)[source]
class mmagic.models.editors.stylegan2.ModMBStddevLayer(group_size=4, channel_groups=1, sync_std=False, sync_groups=None, eps=1e-08)[source]

Bases: mmengine.model.BaseModule

Modified MiniBatch Stddev Layer.

This layer is modified from MiniBatchStddevLayer used in PGGAN. In StyleGAN2, the authors add a new feature, channel_groups, into this layer.

Note that to accelerate the training procedure, we also add a new feature of sync_std to achieve multi-nodes/machine training. This feature is still in beta version and we have tested it on 256 scales.

Parameters
  • group_size (int, optional) – The size of groups in batch dimension. Defaults to 4.

  • channel_groups (int, optional) – The size of groups in channel dimension. Defaults to 1.

  • sync_std (bool, optional) – Whether to use synchronized std feature. Defaults to False.

  • sync_groups (int | None, optional) – The size of groups in node dimension. Defaults to None.

  • eps (float, optional) – Epsilon value to avoid computation error. Defaults to 1e-8.

forward(x)[source]

Forward function.

Parameters

x (Tensor) – Input feature map with shape of (N, C, H, W).

Returns

Output feature map with shape of (N, C+1, H, W).

Return type

Tensor

class mmagic.models.editors.stylegan2.ModulatedToRGB(in_channels, style_channels, out_channels=3, upsample=True, blur_kernel=[1, 3, 3, 1], style_mod_cfg=dict(bias_init=1.0), style_bias=0.0, fp16_enabled=False, conv_clamp=256, out_fp32=True)[source]

Bases: mmengine.model.BaseModule

To RGB layer.

This module is designed to output image tensor in StyleGAN2.

Parameters
  • in_channels (int) – Input channels.

  • style_channels (int) – Channels for the style codes.

  • out_channels (int, optional) – Output channels. Defaults to 3.

  • upsample (bool, optional) – Whether to adopt upsampling in features. Defaults to False.

  • blur_kernel (list[int], optional) – Blurry kernel. Defaults to [1, 3, 3, 1].

  • style_mod_cfg (dict, optional) – Configs for style modulation module. Defaults to dict(bias_init=1.).

  • style_bias (float, optional) – Bias value for style code. Defaults to 0..

  • fp16_enabled (bool, optional) – Whether to use fp16 training in this module. Defaults to False.

  • conv_clamp (float, optional) – Clamp the convolutional layer results to avoid gradient overflow. Defaults to 256.0.

  • out_fp32 (bool, optional) – Whether to convert the output feature map to torch.float32. Defaults to True.

forward(x, style, skip=None)[source]

Forward Function.

Parameters
  • x ([Tensor) – Input features with shape of (N, C, H, W).

  • style (Tensor) – Style latent with shape of (N, C).

  • skip (Tensor, optional) – Tensor for skip link. Defaults to None.

Returns

Output features with shape of (N, C, H, W)

Return type

Tensor

class mmagic.models.editors.stylegan2.ResBlock(in_channels, out_channels, blur_kernel=[1, 3, 3, 1], fp16_enabled=False, convert_input_fp32=True)[source]

Bases: mmengine.model.BaseModule

Residual block used in the discriminator of StyleGAN2.

Parameters
  • in_channels (int) – Input channels.

  • out_channels (int) – Output channels.

  • kernel_size (int) – Kernel size, same as nn.Con2d.

  • fp16_enabled (bool, optional) – Whether to use fp16 training in this module. Defaults to False.

  • convert_input_fp32 (bool, optional) – Whether to convert input type to fp32 if not fp16_enabled. This argument is designed to deal with the cases where some modules are run in FP16 and others in FP32. Defaults to True.

forward(input)[source]

Forward function.

Parameters

input (Tensor) – Input feature map with shape of (N, C, H, W).

Returns

Output feature map.

Return type

Tensor

Read the Docs v: latest
Versions
latest
stable
0.x
Downloads
pdf
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.