mmagic.models.editors.sagan
¶
Package Contents¶
Classes¶
Implementation of Self-Attention Generative Adversarial Networks. |
|
Discriminator for SNGAN / Proj-GAN. The implementation is refer to |
|
Generator for SNGAN / Proj-GAN. The implementation refers to |
|
The first ResBlock used in discriminator of sngan / proj-gan. Compared |
|
resblock used in discriminator of sngan / proj-gan. |
|
ResBlock used in Generator of SNGAN / Proj-GAN. |
- class mmagic.models.editors.sagan.SAGAN(generator: ModelType, discriminator: Optional[ModelType] = None, data_preprocessor: Optional[Union[dict, mmengine.Config]] = None, generator_steps: int = 1, discriminator_steps: int = 1, noise_size: Optional[int] = 128, num_classes: Optional[int] = None, ema_config: Optional[Dict] = None)[source]¶
Bases:
mmagic.models.base_models.BaseConditionalGAN
Implementation of Self-Attention Generative Adversarial Networks.
<https://arxiv.org/abs/1805.08318>`_ (SAGAN), Spectral Normalization for Generative Adversarial Networks (SNGAN), and cGANs with Projection Discriminator (Proj-GAN).
Detailed architecture can be found in
SNGANGenerator
andProjDiscriminator
- Parameters
generator (ModelType) – The config or model of the generator.
discriminator (Optional[ModelType]) – The config or model of the discriminator. Defaults to None.
data_preprocessor (Optional[Union[dict, Config]]) – The pre-process config or
DataPreprocessor
.generator_steps (int) – Number of times the generator was completely updated before the discriminator is updated. Defaults to 1.
discriminator_steps (int) – Number of times the discriminator was completely updated before the generator is updated. Defaults to 1.
noise_size (Optional[int]) – Size of the input noise vector. Default to 128.
num_classes (Optional[int]) – The number classes you would like to generate. Defaults to None.
ema_config (Optional[Dict]) – The config for generator’s exponential moving average setting. Defaults to None.
- disc_loss(disc_pred_fake: torch.Tensor, disc_pred_real: torch.Tensor) Tuple[torch.Tensor, dict] [source]¶
Get disc loss. SAGAN, SNGAN and Proj-GAN use hinge loss to train the discriminator.
- Parameters
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
disc_pred_real (Tensor) – Discriminator’s prediction of the real images.
- Returns
Loss value and a dict of log variables.
- Return type
Tuple[Tensor, dict]
- gen_loss(disc_pred_fake: torch.Tensor) Tuple[torch.Tensor, dict] [source]¶
Get disc loss. SAGAN, SNGAN and Proj-GAN use hinge loss to train the generator.
- Parameters
disc_pred_fake (Tensor) – Discriminator’s prediction of the fake images.
- Returns
Loss value and a dict of log variables.
- Return type
Tuple[Tensor, dict]
- train_discriminator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] [source]¶
Train discriminator.
- Parameters
inputs (dict) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- Returns
A
dict
of tensor for logging.- Return type
Dict[str, Tensor]
- train_generator(inputs: dict, data_samples: mmagic.structures.DataSample, optimizer_wrapper: mmengine.optim.OptimWrapper) Dict[str, torch.Tensor] [source]¶
Train generator.
- Parameters
inputs (dict) – Inputs from dataloader.
data_samples (DataSample) – Data samples from dataloader. Do not used in generator’s training.
optim_wrapper (OptimWrapper) – OptimWrapper instance used to update model parameters.
- Returns
A
dict
of tensor for logging.- Return type
Dict[str, Tensor]
- class mmagic.models.editors.sagan.ProjDiscriminator(input_scale, num_classes=0, base_channels=128, input_channels=3, attention_cfg=dict(type='SelfAttentionBlock'), attention_after_nth_block=- 1, channels_cfg=None, downsample_cfg=None, from_rgb_cfg=dict(type='SNGANDiscHeadResBlock'), blocks_cfg=dict(type='SNGANDiscResBlock'), act_cfg=dict(type='ReLU'), with_spectral_norm=True, sn_style='torch', sn_eps=1e-12, init_cfg=dict(type='BigGAN'), pretrained=None)[source]¶
Bases:
mmengine.model.BaseModule
Discriminator for SNGAN / Proj-GAN. The implementation is refer to https://github.com/pfnet-research/sngan_projection/tree/master/dis_models
The overall structure of the projection discriminator can be split into a
from_rgb
layer, a group of ResBlocks, a linear decision layer, and a projection layer. To support defining custom layers, we introducefrom_rgb_cfg
andblocks_cfg
.The design of the model structure is highly corresponding to the output resolution. Therefore, we provide channels_cfg and downsample_cfg to control the input channels and the downsample behavior of the intermediate blocks.
downsample_cfg
: In default config of SNGAN / Proj-GAN, whether to applydownsample in each intermediate blocks is quite flexible and corresponding to the resolution of the output image. Therefore, we support user to define the
downsample_cfg
by themselves, and to control the structure of the discriminator.channels_cfg
: In default config of SNGAN / Proj-GAN, the number ofResBlocks and the channels of those blocks are corresponding to the resolution of the output image. Therefore, we allow user to define channels_cfg for try their own models. We also provide a default config to allow users to build the model only from the output resolution.
- Parameters
input_scale (int) – The scale of the input image.
num_classes (int, optional) – The number classes you would like to generate. If num_classes=0, no label projection would be used. Default to 0.
base_channels (int, optional) – The basic channel number of the discriminator. The other layers contains channels based on this number. Defaults to 128.
input_channels (int, optional) – Channels of the input image. Defaults to 3.
attention_cfg (dict, optional) – Config for the self-attention block. Default to
dict(type='SelfAttentionBlock')
.attention_after_nth_block (int | list[int], optional) – Self-attention block would be added after which ConvBlock (including the head block). If
int
is passed, only one attention block would be added. Iflist
is passed, self-attention blocks would be added after multiple ConvBlocks. To be noted that if the input is smaller than1
, self-attention corresponding to this index would be ignored. Default to 0.channels_cfg (list | dict[list], optional) – Config for input channels of the intermediate blocks. If list is passed, each element of the list means the input channels of current block is how many times compared to the
base_channels
. For blocki
, the input and output channels should bechannels_cfg[i]
andchannels_cfg[i+1]
If dict is provided, the key of the dict should be the output scale and corresponding value should be a list to define channels. Default: Please refer to_defualt_channels_cfg
.downsample_cfg (list[bool] | dict[list], optional) – Config for downsample behavior of the intermediate layers. If a list is passed,
downsample_cfg[idx] == True
means apply downsample in idx-th block, and vice versa. If dict is provided, the key dict should be the input scale of the image and corresponding value should be a list ti define the downsample behavior. Default: Please refer to_default_downsample_cfg
.from_rgb_cfg (dict, optional) – Config for the first layer to convert rgb image to feature map. Defaults to
dict(type='SNGANDiscHeadResBlock')
.blocks_cfg (dict, optional) – Config for the intermediate blocks. Defaults to
dict(type='SNGANDiscResBlock')
act_cfg (dict, optional) – Activation config for the final output layer. Defaults to
dict(type='ReLU')
.with_spectral_norm (bool, optional) – Whether use spectral norm for all conv blocks or not. Default to True.
sn_style (str, optional) – The style of spectral normalization. If set to ajbrock, implementation by ajbrock(https://github.com/ajbrock/BigGAN-PyTorch/blob/master/layers.py) will be adopted. If set to torch, implementation by PyTorch will be adopted. Defaults to torch.
sn_eps (float, optional) – eps for spectral normalization operation. Defaults to 1e-12.
init_cfg (dict, optional) – Config for weight initialization. Default to
dict(type='BigGAN')
.pretrained (str | dict , optional) – Path for the pretrained model or dict containing information for pretrained models whose necessary key is ‘ckpt_path’. Besides, you can also provide ‘prefix’ to load the generator part from the whole state dict. Defaults to None.
- _defualt_channels_cfg¶
- _defualt_downsample_cfg¶
- forward(x, label=None)[source]¶
Forward function. If self.num_classes is larger than 0, label projection would be used.
- Parameters
x (torch.Tensor) – Fake or real image tensor.
label (torch.Tensor, options) – Label correspond to the input image. Noted that, if self.num_classed is larger than 0, label should not be None. Default to None.
- Returns
Prediction for the reality of the input image.
- Return type
torch.Tensor
- init_weights(pretrained=None, strict=True)[source]¶
Init weights for SNGAN-Proj and SAGAN. If
pretrained=None
and weight initialization would follow theINIT_TYPE
ininit_cfg=dict(type=INIT_TYPE)
.For SNGAN-Proj (
INIT_TYPE.upper() in ['SNGAN', 'SNGAN-PROJ', 'GAN-PROJ']
), we follow the initialization method in the official Chainer’s implementation (https://github.com/pfnet-research/sngan_projection).For SAGAN (
INIT_TYPE.upper() == 'SAGAN'
), we follow the initialization method in official tensorflow’s implementation (https://github.com/brain-research/self-attention-gan).Besides the reimplementation of the official code’s initialization, we provide BigGAN’s and Pytorch-StudioGAN’s style initialization (
INIT_TYPE.upper() == BIGGAN
andINIT_TYPE.upper() == STUDIO
). Please refer to https://github.com/ajbrock/BigGAN-PyTorch and https://github.com/POSTECH-CVLab/PyTorch-StudioGAN.- Parameters
pretrained (str | dict, optional) – Path for the pretrained model or dict containing information for pretrained models whose necessary key is ‘ckpt_path’. Besides, you can also provide ‘prefix’ to load the generator part from the whole state dict. Defaults to None.
- class mmagic.models.editors.sagan.SNGANGenerator(output_scale, num_classes=0, base_channels=64, out_channels=3, input_scale=4, noise_size=128, attention_cfg=dict(type='SelfAttentionBlock'), attention_after_nth_block=0, channels_cfg=None, blocks_cfg=dict(type='SNGANGenResBlock'), act_cfg=dict(type='ReLU'), use_cbn=True, auto_sync_bn=True, with_spectral_norm=False, with_embedding_spectral_norm=None, sn_style='torch', norm_eps=0.0001, sn_eps=1e-12, init_cfg=dict(type='BigGAN'), pretrained=None, rgb_to_bgr=False)[source]¶
Bases:
mmengine.model.BaseModule
Generator for SNGAN / Proj-GAN. The implementation refers to https://github.com/pfnet-research/sngan_projection/tree/master/gen_models
In our implementation, we have two notable design. Namely,
channels_cfg
andblocks_cfg
.channels_cfg
: In default config of SNGAN / Proj-GAN, the number ofResBlocks and the channels of those blocks are corresponding to the resolution of the output image. Therefore, we allow user to define
channels_cfg
to try their own models. We also provide a default config to allow users to build the model only from the output resolution.block_cfg
: In reference code, the generator consists of a group ofResBlock. However, in our implementation, to make this model more generalize, we support defining
blocks_cfg
by users and loading the blocks by calling the build_module method.
- Parameters
output_scale (int) – Output scale for the generated image.
num_classes (int, optional) – The number classes you would like to generate. This arguments would influence the structure of the intermediate blocks and label sampling operation in
forward
(e.g. If num_classes=0, ConditionalNormalization layers would degrade to unconditional ones.). This arguments would be passed to intermediate blocks by overwrite their config. Defaults to 0.base_channels (int, optional) – The basic channel number of the generator. The other layers contains channels based on this number. Default to 64.
out_channels (int, optional) – Channels of the output images. Default to 3.
input_scale (int, optional) – Input scale for the features. Defaults to 4.
noise_size (int, optional) – Size of the input noise vector. Default to 128.
attention_cfg (dict, optional) – Config for the self-attention block. Default to
dict(type='SelfAttentionBlock')
.attention_after_nth_block (int | list[int], optional) – Self attention block would be added after which ConvBlock. If
int
is passed, only one attention block would be added. Iflist
is passed, self-attention blocks would be added after multiple ConvBlocks. To be noted that if the input is smaller than1
, self-attention corresponding to this index would be ignored. Default to 0.channels_cfg (list | dict[list], optional) – Config for input channels of the intermediate blocks. If list is passed, each element of the list means the input channels of current block is how many times compared to the
base_channels
. For blocki
, the input and output channels should bechannels_cfg[i]
andchannels_cfg[i+1]
If dict is provided, the key of the dict should be the output scale and corresponding value should be a list to define channels. Default: Please refer to_defualt_channels_cfg
.blocks_cfg (dict, optional) – Config for the intermediate blocks. Defaults to
dict(type='SNGANGenResBlock')
act_cfg (dict, optional) – Activation config for the final output layer. Defaults to
dict(type='ReLU')
.use_cbn (bool, optional) – Whether use conditional normalization. This argument would pass to norm layers. Defaults to True.
auto_sync_bn (bool, optional) – Whether convert Batch Norm to Synchronized ones when Distributed training is on. Defaults to True.
with_spectral_norm (bool, optional) – Whether use spectral norm for conv blocks or not. Default to False.
with_embedding_spectral_norm (bool, optional) – Whether use spectral norm for embedding layers in normalization blocks or not. If not specified (set as
None
),with_embedding_spectral_norm
would be set as the same value aswith_spectral_norm
. Defaults to None.sn_style (str, optional) – The style of spectral normalization. If set to ajbrock, implementation by ajbrock(https://github.com/ajbrock/BigGAN-PyTorch/blob/master/layers.py) will be adopted. If set to torch, implementation by PyTorch will be adopted. Defaults to torch.
norm_eps (float, optional) – eps for Normalization layers (both conditional and non-conditional ones). Default to 1e-4.
sn_eps (float, optional) – eps for spectral normalization operation. Defaults to 1e-12.
init_cfg (string, optional) – Config for weight initialization. Defaults to
dict(type='BigGAN')
.pretrained (str | dict, optional) – Path for the pretrained model or dict containing information for pretrained models whose necessary key is ‘ckpt_path’. Besides, you can also provide ‘prefix’ to load the generator part from the whole state dict. Defaults to None.
rgb_to_bgr (bool, optional) – Whether to reformat the output channels with order bgr. We provide several pre-trained BigGAN weights whose output channels order is rgb. You can set this argument to True to use the weights.
- _default_channels_cfg¶
- forward(noise, num_batches=0, label=None, return_noise=False)[source]¶
Forward function.
- Parameters
noise (torch.Tensor | callable | None) – You can directly give a batch of noise through a
torch.Tensor
or offer a callable function to sample a batch of noise data. Otherwise, theNone
indicates to use the default noise sampler.num_batches (int, optional) – The number of batch size. Defaults to 0.
label (torch.Tensor | callable | None) – You can directly give a batch of label through a
torch.Tensor
or offer a callable function to sample a batch of label data. Otherwise, theNone
indicates to use the default label sampler.return_noise (bool, optional) – If True,
noise_batch
will be returned in a dict withfake_img
. Defaults to False.
- Returns
- If not
return_noise
, only the output image will be returned. Otherwise, a dict contains
fake_image
,noise_batch
andlabel_batch
would be returned.
- If not
- Return type
torch.Tensor | dict
- init_weights(pretrained=None, strict=True)[source]¶
Init weights for SNGAN-Proj and SAGAN. If
pretrained=None
, weight initialization would follow theINIT_TYPE
ininit_cfg=dict(type=INIT_TYPE)
.For SNGAN-Proj, (
INIT_TYPE.upper() in ['SNGAN', 'SNGAN-PROJ', 'GAN-PROJ']
), we follow the initialization method in the official Chainer’s implementation (https://github.com/pfnet-research/sngan_projection).For SAGAN (
INIT_TYPE.upper() == 'SAGAN'
), we follow the initialization method in official tensorflow’s implementation (https://github.com/brain-research/self-attention-gan).Besides the reimplementation of the official code’s initialization, we provide BigGAN’s and Pytorch-StudioGAN’s style initialization (
INIT_TYPE.upper() == BIGGAN
andINIT_TYPE.upper() == STUDIO
). Please refer to https://github.com/ajbrock/BigGAN-PyTorch and https://github.com/POSTECH-CVLab/PyTorch-StudioGAN.- Parameters
pretrained (str | dict, optional) – Path for the pretrained model or dict containing information for pretrained models whose necessary key is ‘ckpt_path’. Besides, you can also provide ‘prefix’ to load the generator part from the whole state dict. Defaults to None.
- class mmagic.models.editors.sagan.SNGANDiscHeadResBlock(in_channels, out_channels, conv_cfg=None, act_cfg=dict(type='ReLU'), with_spectral_norm=True, sn_eps=1e-12, sn_style='torch', init_cfg=dict(type='BigGAN'))[source]¶
Bases:
mmengine.model.BaseModule
The first ResBlock used in discriminator of sngan / proj-gan. Compared to
SNGANDisResBlock
, this module has a different forward order.- Parameters
in_channels (int) – Input channels.
out_channels (int) – Output channels.
downsample (bool, optional) – whether apply downsample operation in this module. default to false.
conv_cfg (dict | none) – config for conv blocks of this module. if pass
none
, would use_default_conv_cfg
. default tonone
.act_cfg (dict, optional) – config for activate function. default to
dict(type='relu')
.with_spectral_norm (bool, optional) – whether use spectral norm for conv blocks and norm layers. default to true.
sn_style (str, optional) – The style of spectral normalization. If set to ajbrock, implementation by ajbrock(https://github.com/ajbrock/BigGAN-PyTorch/blob/master/layers.py) will be adopted. If set to torch, implementation by PyTorch will be adopted. Defaults to torch.
sn_eps (float, optional) – eps for spectral normalization operation. Default to 1e-12.
init_cfg (dict, optional) – Config for weight initialization. Default to
dict(type='BigGAN')
.
- _default_conv_cfg¶
- forward(x: torch.Tensor) torch.Tensor [source]¶
Forward function.
- Parameters
x (Tensor) – Input tensor with shape (n, c, h, w).
- Returns
Forward results.
- Return type
Tensor
- class mmagic.models.editors.sagan.SNGANDiscResBlock(in_channels, out_channels, hidden_channels=None, downsample=False, act_cfg=dict(type='ReLU'), conv_cfg=None, with_spectral_norm=True, sn_style='torch', sn_eps=1e-12, init_cfg=dict(type='BigGAN'))[source]¶
Bases:
mmengine.model.BaseModule
resblock used in discriminator of sngan / proj-gan.
- Parameters
in_channels (int) – input channels.
out_channels (int) – output channels.
hidden_channels (int, optional) – input channels of the second conv layer of the block. if
none
is given, would be set asout_channels
. Defaults to none.downsample (bool, optional) – whether apply downsample operation in this module. Defaults to false.
act_cfg (dict, optional) – config for activate function. default to
dict(type='relu')
.conv_cfg (dict | none) – config for conv blocks of this module. if pass
none
, would use_default_conv_cfg
. default tonone
.with_spectral_norm (bool, optional) – whether use spectral norm for conv blocks and norm layers. Defaults to true.
sn_eps (float, optional) – eps for spectral normalization operation. Default to 1e-12.
sn_style (str, optional) – The style of spectral normalization. If set to ajbrock, implementation by ajbrock(https://github.com/ajbrock/BigGAN-PyTorch/blob/master/layers.py) will be adopted. If set to torch, implementation by PyTorch will be adopted. Defaults to torch.
init_cfg (dict, optional) – Config for weight initialization. Defaults to
dict(type='BigGAN')
.
- _default_conv_cfg¶
- forward(x)[source]¶
Forward function.
- Parameters
x (Tensor) – Input tensor with shape (n, c, h, w).
- Returns
Forward results.
- Return type
Tensor
- class mmagic.models.editors.sagan.SNGANGenResBlock(in_channels, out_channels, hidden_channels=None, num_classes=0, use_cbn=True, use_norm_affine=False, act_cfg=dict(type='ReLU'), norm_cfg=dict(type='BN'), upsample_cfg=dict(type='nearest', scale_factor=2), upsample=True, auto_sync_bn=True, conv_cfg=None, with_spectral_norm=False, with_embedding_spectral_norm=None, sn_style='torch', norm_eps=0.0001, sn_eps=1e-12, init_cfg=dict(type='BigGAN'))[source]¶
Bases:
mmengine.model.BaseModule
ResBlock used in Generator of SNGAN / Proj-GAN.
- Parameters
in_channels (int) – Input channels.
out_channels (int) – Output channels.
hidden_channels (int, optional) – Input channels of the second Conv layer of the block. If
None
is given, would be set asout_channels
. Default to None.num_classes (int, optional) – Number of classes would like to generate. This argument would pass to norm layers and influence the structure and behavior of the normalization process. Default to 0.
use_cbn (bool, optional) – Whether use conditional normalization. This argument would pass to norm layers. Default to True.
use_norm_affine (bool, optional) – Whether use learnable affine parameters in norm operation when cbn is off. Default False.
act_cfg (dict, optional) – Config for activate function. Default to
dict(type='ReLU')
.upsample_cfg (dict, optional) – Config for the upsample method. Default to
dict(type='nearest', scale_factor=2)
.upsample (bool, optional) – Whether apply upsample operation in this module. Default to True.
auto_sync_bn (bool, optional) – Whether convert Batch Norm to Synchronized ones when Distributed training is on. Default to True.
conv_cfg (dict | None) – Config for conv blocks of this module. If pass
None
, would use_default_conv_cfg
. Default toNone
.with_spectral_norm (bool, optional) – Whether use spectral norm for conv blocks and norm layers. Default to True.
with_embedding_spectral_norm (bool, optional) – Whether use spectral norm for embedding layers in normalization blocks or not. If not specified (set as
None
),with_embedding_spectral_norm
would be set as the same value aswith_spectral_norm
. Default to None.sn_style (str, optional) – The style of spectral normalization. If set to ajbrock, implementation by ajbrock(https://github.com/ajbrock/BigGAN-PyTorch/blob/master/layers.py) will be adopted. If set to torch, implementation by PyTorch will be adopted. Defaults to torch.
norm_eps (float, optional) – eps for Normalization layers (both conditional and non-conditional ones). Default to 1e-4.
sn_eps (float, optional) – eps for spectral normalization operation. Default to 1e-12.
init_cfg (dict, optional) – Config for weight initialization. Default to
dict(type='BigGAN')
.
- _default_conv_cfg¶
- forward(x, y=None)[source]¶
Forward function.
- Parameters
x (Tensor) – Input tensor with shape (n, c, h, w).
y (Tensor) – Input label with shape (n, ). Default None.
- Returns
Forward results.
- Return type
Tensor