`mmagic.models.editors.mspie`¶

Package Contents¶

Classes¶

`MSPIEStyleGAN2`	MS-PIE StyleGAN2.
`MSStyleGAN2Discriminator`	StyleGAN2 Discriminator.
`MSStyleGANv2Generator`	StyleGAN2 Generator.
`PESinGAN`	Positional Encoding in SinGAN.
`SinGANMSGeneratorPE`	Multi-Scale Generator used in SinGAN with positional encoding.
`CatersianGrid`	Catersian Grid for 2d tensor.
`SinusoidalPositionalEmbedding`	Sinusoidal Positional Embedding 1D or 2D (SPE/SPE2d).

class mmagic.models.editors.mspie.MSPIEStyleGAN2(*args, train_settings=dict(), **kwargs)[source]¶

Bases: mmagic.models.editors.stylegan2.StyleGAN2

MS-PIE StyleGAN2.

In this GAN, we adopt the MS-PIE training schedule so that multi-scale images can be generated with a single generator. Details can be found in: Positional Encoding as Spatial Inductive Bias in GANs, CVPR2021.

Parameters: train_settings (dict) – Config for training settings. Defaults to dict().

train_step(data: dict, optim_wrapper: mmengine.optim.OptimWrapperDict) → Dict[str, torch.Tensor][source]¶

Train GAN model. In the training of GAN models, generator and discriminator are updated alternatively. In MMagic’s design, self.train_step is called with data input. Therefore we always update discriminator, whose updating is relay on real data, and then determine if the generator needs to be updated based on the current number of iterations. More details about whether to update generator can be found in should_gen_update().

Parameters

data (dict) – Data sampled from dataloader.
optim_wrapper (OptimWrapperDict) – OptimWrapperDict instance contains OptimWrapper of generator and discriminator.

Returns

A dict of tensor for logging.

Return type

Dict[str, torch.Tensor]

train_generator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) → Dict[str, torch.Tensor][source]¶

Train generator.

Parameters

inputs (TrainInput) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader. Do not used in generator’s training.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.

Returns

A dict of tensor for logging.

Return type

Dict[str, Tensor]

train_discriminator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) → Dict[str, torch.Tensor][source]¶

Train discriminator.

Parameters

inputs (TrainInput) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.

Returns

A dict of tensor for logging.

Return type

Dict[str, Tensor]

class mmagic.models.editors.mspie.MSStyleGAN2Discriminator(in_size, channel_multiplier=2, blur_kernel=[1, 3, 3, 1], mbstd_cfg=dict(group_size=4, channel_groups=1), with_adaptive_pool=False, pool_size=(2, 2))[source]¶

Bases: mmengine.model.BaseModule

StyleGAN2 Discriminator.

The architecture of this discriminator is proposed in StyleGAN2. More details can be found in: Analyzing and Improving the Image Quality of StyleGAN CVPR2020.

Parameters

in_size (int) – The input size of images.
channel_multiplier (int, optional) – The multiplier factor for the channel number. Defaults to 2.
blur_kernel (list, optional) – The blurry kernel. Defaults to [1, 3, 3, 1].
mbstd_cfg (dict, optional) – Configs for minibatch-stddev layer. Defaults to dict(group_size=4, channel_groups=1).

forward(x)[source]¶

Forward function.

Parameters: x (torch.Tensor) – Input image tensor.
Returns: Predict score for the input image.
Return type: torch.Tensor

class mmagic.models.editors.mspie.MSStyleGANv2Generator(out_size, style_channels, num_mlps=8, channel_multiplier=2, blur_kernel=[1, 3, 3, 1], lr_mlp=0.01, default_style_mode='mix', eval_style_mode='single', mix_prob=0.9, no_pad=False, deconv2conv=False, interp_pad=None, up_config=dict(scale_factor=2, mode='nearest'), up_after_conv=False, head_pos_encoding=None, head_pos_size=(4, 4), interp_head=False)[source]¶

Bases: mmengine.model.BaseModule

StyleGAN2 Generator.

In StyleGAN2, we use a static architecture composing of a style mapping module and number of convolutional style blocks. More details can be found in: Analyzing and Improving the Image Quality of StyleGAN CVPR2020.

Parameters

out_size (int) – The output size of the StyleGAN2 generator.
style_channels (int) – The number of channels for style code.
num_mlps (int, optional) – The number of MLP layers. Defaults to 8.
channel_multiplier (int, optional) – The multiplier factor for the channel number. Defaults to 2.
blur_kernel (list, optional) – The blurry kernel. Defaults to [1, 3, 3, 1].
lr_mlp (float, optional) – The learning rate for the style mapping layer. Defaults to 0.01.
default_style_mode (str, optional) – The default mode of style mixing. In training, we adopt mixing style mode in default. However, in the evaluation, we use ‘single’ style mode. [‘mix’, ‘single’] are currently supported. Defaults to ‘mix’.
eval_style_mode (str, optional) – The evaluation mode of style mixing. Defaults to ‘single’.
mix_prob (float, optional) – Mixing probability. The value should be in range of [0, 1]. Defaults to 0.9.

train(mode=True)[source]¶

Set train/eval mode.

Parameters: mode (bool, optional) – Whether set train mode. Defaults to True.

make_injected_noise(chosen_scale=0)[source]¶

make noises that will be injected into feature maps.

Parameters: chosen_scale (int, optional) – Chosen scale. Defaults to 0.
Returns: List of layer-wise noise tensor.
Return type: list[Tensor]

get_mean_latent(num_samples=4096, **kwargs)[source]¶

Get mean latent of W space in this generator.

Parameters: num_samples (int, optional) – Number of sample times. Defaults to 4096.
Returns: Mean latent of this generator.
Return type: Tensor

style_mixing(n_source, n_target, inject_index=1, truncation_latent=None, truncation=0.7, chosen_scale=0)[source]¶

Generating style mixing images.

Parameters

n_source (int) – Number of source images.
n_target (int) – Number of target images.
inject_index (int, optional) – Index from which replace with source latent. Defaults to 1.
truncation_latent (torch.Tensor, optional) – Mean truncation latent. Defaults to None.
truncation (float, optional) – Truncation factor. Give value less than 1., the truncation trick will be adopted. Defaults to 1.
curr_scale (int) – Current image scale. Defaults to -1.
transition_weight (float, optional) – The weight used in resolution transition. Defaults to 1.0.
chosen_scale (int, optional) – Chosen scale. Defaults to 0.

Returns

Table of style-mixing images.

Return type

torch.Tensor

forward(styles, num_batches=- 1, return_noise=False, return_latents=False, inject_index=None, truncation=1, truncation_latent=None, input_is_latent=False, injected_noise=None, randomize_noise=True, chosen_scale=0)[source]¶

Forward function.

This function has been integrated with the truncation trick. Please refer to the usage of truncation and truncation_latent.

Parameters

styles (torch.Tensor | list[torch.Tensor] | callable | None) – In StyleGAN2, you can provide noise tensor or latent tensor. Given a list containing more than one noise or latent tensors, style mixing trick will be used in training. Of course, You can directly give a batch of noise through a torch.Tensor or offer a callable function to sample a batch of noise data. Otherwise, the None indicates to use the default noise sampler.
num_batches (int, optional) – The number of batch size. Defaults to 0.
return_noise (bool, optional) – If True, noise_batch will be returned in a dict with fake_img. Defaults to False.
return_latents (bool, optional) – If True, latent will be returned in a dict with fake_img. Defaults to False.
inject_index (int | None, optional) – The index number for mixing style codes. Defaults to None.
truncation (float, optional) – Truncation factor. Give value less than 1., the truncation trick will be adopted. Defaults to 1.
truncation_latent (torch.Tensor, optional) – Mean truncation latent. Defaults to None.
input_is_latent (bool, optional) – If True, the input tensor is the latent tensor. Defaults to False.
injected_noise (torch.Tensor | None, optional) – Given a tensor, the random noise will be fixed as this input injected noise. Defaults to None.
randomize_noise (bool, optional) – If False, images are sampled with the buffered noise tensor injected to the style conv block. Defaults to True.

Returns

Generated image tensor or dictionary containing more data.

Return type

torch.Tensor | dict

class mmagic.models.editors.mspie.PESinGAN(generator: ModelType, discriminator: Optional[ModelType], data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, num_scales: Optional[int] = None, fixed_noise_with_pad: bool = False, first_fixed_noises_ch: int = 1, iters_per_scale: int = 200, noise_weight_init: int = 0.1, lr_scheduler_args: Optional[dict] = None, test_pkl_data: Optional[str] = None, ema_confg: Optional[dict] = None)[source]¶

Bases: mmagic.models.editors.singan.SinGAN

Positional Encoding in SinGAN.

This modified SinGAN is used to reimplement the experiments in: Positional Encoding as Spatial Inductive Bias in GANs, CVPR2021.

construct_fixed_noises()[source]¶: Construct the fixed noises list used in SinGAN.

class mmagic.models.editors.mspie.SinGANMSGeneratorPE(in_channels, out_channels, num_scales, kernel_size=3, padding=0, num_layers=5, base_channels=32, min_feat_channels=32, out_act_cfg=dict(type='Tanh'), padding_mode='zero', pad_at_head=True, interp_pad=False, noise_with_pad=False, positional_encoding=None, first_stage_in_channels=None, **kwargs)[source]¶

Bases: mmagic.models.editors.singan.singan_generator.SinGANMultiScaleGenerator

Multi-Scale Generator used in SinGAN with positional encoding.

More details can be found in: Positional Encoding as Spatial Inductive Bias in GANs, CVPR’2021.

Notes:

In this version, we adopt the interpolation function from the official PyTorch APIs, which is different from the original implementation by the authors. However, in our experiments, this influence can be ignored.

Parameters

in_channels (int) – Input channels.
out_channels (int) – Output channels.
num_scales (int) – The number of scales/stages in generator. Note that this number is counted from zero, which is the same as the original paper.
kernel_size (int, optional) – Kernel size, same as nn.Conv2d. Defaults to 3.
padding (int, optional) – Padding for the convolutional layer, same as nn.Conv2d. Defaults to 0.
num_layers (int, optional) – The number of convolutional layers in each generator block. Defaults to 5.
base_channels (int, optional) – The basic channels for convolutional layers in the generator block. Defaults to 32.
min_feat_channels (int, optional) – Minimum channels for the feature maps in the generator block. Defaults to 32.
out_act_cfg (dict | None, optional) – Configs for output activation layer. Defaults to dict(type=’Tanh’).
padding_mode (str, optional) – The mode of convolutional padding, same as nn.Conv2d. Defaults to ‘zero’.
pad_at_head (bool, optional) – Whether to add padding at head. Defaults to True.
interp_pad (bool, optional) – The padding value of interpolating feature maps. Defaults to False.
noise_with_pad (bool, optional) – Whether the input fixed noises are with explicit padding. Defaults to False.
positional_encoding (dict | None, optional) – Configs for the positional encoding. Defaults to None.
first_stage_in_channels (int | None, optional) – The input channel of the first generator block. If None, the first stage will adopt the same input channels as other stages. Defaults to None.

forward(input_sample, fixed_noises, noise_weights, rand_mode, curr_scale, num_batches=1, get_prev_res=False, return_noise=False)[source]¶

Forward function.

Parameters

input_sample (Tensor | None) – The input for generator. In the original implementation, a tensor filled with zeros is adopted. If None is given, we will construct it from the first fixed noises.
fixed_noises (list[Tensor]) – List of the fixed noises in SinGAN.
noise_weights (list[float]) – List of the weights for random noises.
rand_mode (str) – Choices from [‘rand’, ‘recon’]. In rand mode, it will sample from random noises. Otherwise, the reconstruction for the single image will be returned.
curr_scale (int) – The scale for the current inference or training.
num_batches (int, optional) – The number of batches. Defaults to 1.
get_prev_res (bool, optional) – Whether to return results from previous stages. Defaults to False.
return_noise (bool, optional) – Whether to return noises tensor. Defaults to False.

Returns

Generated image tensor or dictionary containing more data.

Return type

Tensor | dict

class mmagic.models.editors.mspie.CatersianGrid(init_cfg: Union[dict, List[dict], None] = None)[source]¶

Bases: mmengine.model.BaseModule

Catersian Grid for 2d tensor.

The Catersian Grid is a common-used positional encoding in deep learning. In this implementation, we follow the convention of grid_sample in PyTorch. In other words, [-1, -1] denotes the left-top corner while [1, 1] denotes the right-bottom corner.

forward(x, **kwargs)[source]¶

make_grid2d(height, width, num_batches=1, requires_grad=False)[source]¶

make_grid2d_like(x, requires_grad=False)[source]¶

Input tensor with shape of (b, …, h, w) Return tensor with shape of (b, 2 x emb_dim, h, w)

Note that the positional embedding highly depends on the the function, make_grid2d.

class mmagic.models.editors.mspie.SinusoidalPositionalEmbedding(embedding_dim, padding_idx, init_size=1024, div_half_dim=False, center_shift=None)[source]¶

Bases: mmengine.model.BaseModule

Sinusoidal Positional Embedding 1D or 2D (SPE/SPE2d).

This module is a modified from: https://github.com/pytorch/fairseq/blob/master/fairseq/modules/sinusoidal_positional_embedding.py # noqa

Based on the original SPE in single dimension, we implement a 2D sinusoidal positional encoding (SPE2d), as introduced in Positional Encoding as Spatial Inductive Bias in GANs, CVPR’2021.

Parameters

embedding_dim (int) – The number of dimensions for the positional encoding.
padding_idx (int | list[int]) – The index for the padding contents. The padding positions will obtain an encoding vector filling in zeros.
init_size (int, optional) – The initial size of the positional buffer. Defaults to 1024.
div_half_dim (bool, optional) – If true, the embedding will be divided by \(d/2\). Otherwise, it will be divided by \((d/2 -1)\). Defaults to False.
center_shift (int | None, optional) – Shift the center point to some index. Defaults to None.

static get_embedding(num_embeddings, embedding_dim, padding_idx=None, div_half_dim=False)[source]¶

Build sinusoidal embeddings.

This matches the implementation in tensor2tensor, but differs slightly from the description in Section 3.5 of “Attention Is All You Need”.

forward(input, **kwargs)[source]¶

Input is expected to be of size [bsz x seqlen].

Returned tensor is expected to be of size [bsz x seq_len x emb_dim]

make_positions(input, padding_idx)[source]¶

Make position tensors.

Parameters

input (tensor) – Input tensor.
padding_idx (int | list[int]) – The index for the padding contents.
filling (The padding positions will obtain an encoding vector) –
zeros. (in) –

Returns

Position tensors.

Return type

tensor

make_grid2d(height, width, num_batches=1, center_shift=None)[source]¶

Make 2-d grid mask.

Parameters

height (int) – Height of the grid.
width (int) – Width of the grid.
num_batches (int, optional) – The number of batch size. Defaults to 1.
center_shift (int | None, optional) – Shift the center point to some index. Defaults to None.

Returns

2-d Grid mask.

Return type

Tensor

make_grid2d_like(x, center_shift=None)[source]¶

Input tensor with shape of (b, …, h, w) Return tensor with shape of (b, 2 x emb_dim, h, w)

Note that the positional embedding highly depends on the the function, make_positions.

mmagic.models.editors.mspie¶

Package Contents¶

Classes¶

`mmagic.models.editors.mspie`¶