`mmagic.models.editors.deepfillv1`¶

Package Contents¶

Classes¶

`ContextualAttentionModule`	Contexture attention module.
`ContextualAttentionNeck`	Neck with contextual attention module.
`DeepFillDecoder`	Decoder used in DeepFill model.
`DeepFillv1Discriminators`	Discriminators used in DeepFillv1 model.
`DeepFillEncoder`	Encoder used in DeepFill model.
`DeepFillRefiner`	Refiner used in DeepFill model.
`DeepFillv1Inpaintor`	Inpaintor for deepfillv1 method.

class mmagic.models.editors.deepfillv1.ContextualAttentionModule(unfold_raw_kernel_size=4, unfold_raw_stride=2, unfold_raw_padding=1, unfold_corr_kernel_size=3, unfold_corr_stride=1, unfold_corr_dilation=1, unfold_corr_padding=1, scale=0.5, fuse_kernel_size=3, softmax_scale=10, return_attention_score=True)[source]¶

Bases: mmengine.model.BaseModule

Contexture attention module.

The details of this module can be found in: Generative Image Inpainting with Contextual Attention

Parameters

unfold_raw_kernel_size (int) – Kernel size used in unfolding raw feature. Default: 4.
unfold_raw_stride (int) – Stride used in unfolding raw feature. Default: 2.
unfold_raw_padding (int) – Padding used in unfolding raw feature. Default: 1.
unfold_corr_kernel_size (int) – Kernel size used in unfolding context for computing correlation maps. Default: 3.
unfold_corr_stride (int) – Stride used in unfolding context for computing correlation maps. Default: 1.
unfold_corr_dilation (int) – Dilation used in unfolding context for computing correlation maps. Default: 1.
unfold_corr_padding (int) – Padding used in unfolding context for computing correlation maps. Default: 1.
scale (float) – The resale factor used in resize input features. Default: 0.5.
fuse_kernel_size (int) – The kernel size used in fusion module. Default: 3.
softmax_scale (float) – The scale factor for softmax function. Default: 10.
return_attention_score (bool) – If True, the attention score will be returned. Default: True.

forward(x, context, mask=None)[source]¶

Forward Function.

Parameters

x (torch.Tensor) – Tensor with shape (n, c, h, w).
context (torch.Tensor) – Tensor with shape (n, c, h, w).
mask (torch.Tensor) – Tensor with shape (n, 1, h, w). Default: None.

Returns

Features after contextural attention.

Return type

tuple(torch.Tensor)

patch_correlation(x, kernel)[source]¶

Calculate patch correlation.

Parameters

x (torch.Tensor) – Input tensor.
kernel (torch.Tensor) – Kernel tensor.

Returns

Tensor with shape of (n, l, h, w).

Return type

torch.Tensor

patch_copy_deconv(attention_score, context_filter)[source]¶

Copy patches using deconv.

Parameters

attention_score (torch.Tensor) – Tensor with shape of (n, l , h, w).
context_filter (torch.Tensor) – Filter kernel.

Returns

Tensor with shape of (n, c, h, w).

Return type

torch.Tensor

fuse_correlation_map(correlation_map, h_unfold, w_unfold)[source]¶

Fuse correlation map.

This operation is to fuse correlation map for increasing large consistent correlation regions.

The mechanism behind this op is simple and easy to understand. A standard ‘Eye’ matrix will be applied as a filter on the correlation map in horizontal and vertical direction.

The shape of input correlation map is (n, h_unfold*w_unfold, h, w). When adopting fusing, we will apply convolutional filter in the reshaped feature map with shape of (n, 1, h_unfold*w_fold, h*w).

A simple specification for horizontal direction is shown below:

       (h, (h, (h, (h,
        0)  1)  2)  3)  ...
(h, 0)
(h, 1)      1
(h, 2)          1
(h, 3)              1
...

calculate_unfold_hw(input_size, kernel_size=3, stride=1, dilation=1, padding=0)[source]¶

Calculate (h, w) after unfolding.

The official implementation of unfold in pytorch will put the dimension (h, w) into L. Thus, this function is just to calculate the (h, w) according to the equation in: https://pytorch.org/docs/stable/nn.html#torch.nn.Unfold

calculate_overlap_factor(attention_score)[source]¶

Calculate the overlap factor after applying deconv.

Parameters: attention_score (torch.Tensor) – The attention score with shape of (n, c, h, w).
Returns: The overlap factor will be returned.
Return type: torch.Tensor

mask_correlation_map(correlation_map, mask)[source]¶

Add mask weight for correlation map.

Add a negative infinity number to the masked regions so that softmax function will result in ‘zero’ in those regions.

Parameters

correlation_map (torch.Tensor) – Correlation map with shape of (n, h_unfold*w_unfold, h_map, w_map).
mask (torch.Tensor) – Mask tensor with shape of (n, c, h, w). ‘1’ in the mask indicates masked region while ‘0’ indicates valid region.

Returns

Updated correlation map with mask.

Return type

torch.Tensor

im2col(img, kernel_size, stride=1, padding=0, dilation=1, normalize=False, return_cols=False)[source]¶

Reshape image-style feature to columns.

This function is used for unfold feature maps to columns. The details of this function can be found in: https://pytorch.org/docs/1.1.0/nn.html?highlight=unfold#torch.nn.Unfold

Parameters

img (torch.Tensor) – Features to be unfolded. The shape of this feature should be (n, c, h, w).
kernel_size (int) – In this function, we only support square kernel with same height and width.
stride (int) – Stride number in unfolding. Default: 1.
padding (int) – Padding number in unfolding. Default: 0.
dilation (int) – Dilation number in unfolding. Default: 1.
normalize (bool) – If True, the unfolded feature will be normalized. Default: False.
return_cols (bool) – The official implementation in PyTorch of unfolding will return features with shape of (n, c*$prod{kernel_size}$, L). If True, the features will be reshaped to (n, L, c, kernel_size, kernel_size). Otherwise, the results will maintain the shape as the official implementation.

Returns

Unfolded columns. If return_cols is True, the shape of output tensor is (n, L, c, kernel_size, kernel_size). Otherwise, the shape will be (n, c*$prod{kernel_size}$, L).

Return type

torch.Tensor

class mmagic.models.editors.deepfillv1.ContextualAttentionNeck(in_channels, conv_type='conv', conv_cfg=None, norm_cfg=None, act_cfg=dict(type='ELU'), contextual_attention_args=dict(softmax_scale=10.0), **kwargs)[source]¶

Bases: mmengine.model.BaseModule

Neck with contextual attention module.

Parameters

in_channels (int) – The number of input channels.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
conv_cfg (dict | None) – Config of conv module. Default: None.
norm_cfg (dict | None) – Config of norm module. Default: None.
act_cfg (dict | None) – Config of activation layer. Default: dict(type=’ELU’).
contextual_attention_args (dict) – Config of contextual attention module. Default: dict(softmax_scale=10.).
kwargs (keyword arguments) –

_conv_type¶

forward(x, mask)[source]¶

Forward Function.

Parameters

x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
mask (torch.Tensor) – Input tensor with shape of (n, 1, h, w).

Returns

Output tensor with shape of (n, c, h’, w’).

Return type

torch.Tensor

class mmagic.models.editors.deepfillv1.DeepFillDecoder(in_channels, conv_type='conv', norm_cfg=None, act_cfg=dict(type='ELU'), out_act_cfg=dict(type='clip', min=- 1.0, max=1.0), channel_factor=1.0, **kwargs)[source]¶

Bases: mmengine.model.BaseModule

Decoder used in DeepFill model.

This implementation follows: Generative Image Inpainting with Contextual Attention

Parameters

in_channels (int) – The number of input channels.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
norm_cfg (dict) – Config dict to build norm layer. Default: None.
act_cfg (dict) – Config dict for activation layer, “elu” by default.
out_act_cfg (dict) – Config dict for output activation layer. Here, we provide commonly used clamp or clip operation.
channel_factor (float) – The scale factor for channel size. Default: 1.
kwargs (keyword arguments) –

_conv_type¶

forward(input_dict)[source]¶

Forward Function.

Parameters: input_dict (dict | torch.Tensor) – Input dict with middle features or torch.Tensor.
Returns: Output tensor with shape of (n, c, h, w).
Return type: torch.Tensor

class mmagic.models.editors.deepfillv1.DeepFillv1Discriminators(global_disc_cfg, local_disc_cfg)[source]¶

Bases: mmengine.model.BaseModule

Discriminators used in DeepFillv1 model.

In DeepFillv1 model, the discriminators are independent without any concatenation like Global&Local model. Thus, we call this model DeepFillv1Discriminators. There exist a global discriminator and a local discriminator with global and local input respectively.

The details can be found in: Generative Image Inpainting with Contextual Attention.

Parameters

global_disc_cfg (dict) – Config dict for global discriminator.
local_disc_cfg (dict) – Config dict for local discriminator.

forward(x)[source]¶

Forward function.

Parameters: x (tuple[torch.Tensor]) – Contains global image and the local image patch.
Returns: Contains the prediction from discriminators in global image and local image patch.
Return type: tuple[torch.Tensor]

init_weights()[source]¶: Init weights for models.

class mmagic.models.editors.deepfillv1.DeepFillEncoder(in_channels=5, conv_type='conv', norm_cfg=None, act_cfg=dict(type='ELU'), encoder_type='stage1', channel_factor=1.0, **kwargs)[source]¶

Bases: mmengine.model.BaseModule

Encoder used in DeepFill model.

This implementation follows: Generative Image Inpainting with Contextual Attention

Parameters

in_channels (int) – The number of input channels. Default: 5.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
norm_cfg (dict) – Config dict to build norm layer. Default: None.
act_cfg (dict) – Config dict for activation layer, “elu” by default.
encoder_type (str) – Type of the encoder. Should be one of [‘stage1’, ‘stage2_conv’, ‘stage2_attention’]. Default: ‘stage1’.
channel_factor (float) – The scale factor for channel size. Default: 1.
kwargs (keyword arguments) –

_conv_type¶

forward(x)[source]¶

Forward Function.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Output tensor with shape of (n, c, h’, w’).
Return type: torch.Tensor

class mmagic.models.editors.deepfillv1.DeepFillRefiner(encoder_attention=dict(type='DeepFillEncoder', encoder_type='stage2_attention'), encoder_conv=dict(type='DeepFillEncoder', encoder_type='stage2_conv'), dilation_neck=dict(type='GLDilationNeck', in_channels=128, act_cfg=dict(type='ELU')), contextual_attention=dict(type='ContextualAttentionNeck', in_channels=128), decoder=dict(type='DeepFillDecoder', in_channels=256))[source]¶

Bases: mmengine.model.BaseModule

Refiner used in DeepFill model.

This implementation follows: Generative Image Inpainting with Contextual Attention.

Parameters

encoder_attention (dict) – Config dict for encoder used in branch with contextual attention module.
encoder_conv (dict) – Config dict for encoder used in branch with just convolutional operation.
dilation_neck (dict) – Config dict for dilation neck in branch with just convolutional operation.
contextual_attention (dict) – Config dict for contextual attention neck.
decoder (dict) – Config dict for decoder used to fuse and decode features.

forward(x, mask)[source]¶

Forward Function.

Parameters

x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
mask (torch.Tensor) – Input tensor with shape of (n, 1, h, w).

Returns

Output tensor with shape of (n, c, h’, w’).

Return type

torch.Tensor

class mmagic.models.editors.deepfillv1.DeepFillv1Inpaintor(data_preprocessor: dict, encdec: dict, disc=None, loss_gan=None, loss_gp=None, loss_disc_shift=None, loss_composed_percep=None, loss_out_percep=False, loss_l1_hole=None, loss_l1_valid=None, loss_tv=None, stage1_loss_type=None, stage2_loss_type=None, train_cfg=None, test_cfg=None, init_cfg: Optional[dict] = None)[source]¶

Bases: mmagic.models.base_models.TwoStageInpaintor

Inpaintor for deepfillv1 method.

This inpaintor is implemented according to the paper: Generative image inpainting with contextual attention

Importantly, this inpaintor is an example for using custom training schedule based on TwoStageInpaintor.

The training pipeline of deepfillv1 is as following:

if cur_iter < iter_tc:
    update generator with only l1 loss
else:
    update discriminator
    if cur_iter > iter_td:
        update generator with l1 loss and adversarial loss

The new attribute cur_iter is added for recording current number of iteration. The train_cfg contains the setting of the training schedule:

train_cfg = dict(
    start_iter=0,
    disc_step=1,
    iter_tc=90000,
    iter_td=100000
)

iter_tc and iter_td correspond to the notation $T_C$ and $T_D$ of the original paper.

Parameters

generator (dict) – Config for encoder-decoder style generator.
disc (dict) – Config for discriminator.
loss_gan (dict) – Config for adversarial loss.
loss_gp (dict) – Config for gradient penalty loss.
loss_disc_shift (dict) – Config for discriminator shift loss.
loss_composed_percep (dict) – Config for perceptual and style loss with composed image as input.
loss_out_percep (dict) – Config for perceptual and style loss with direct output as input.
loss_l1_hole (dict) – Config for l1 loss in the hole.
loss_l1_valid (dict) – Config for l1 loss in the valid region.
loss_tv (dict) – Config for total variation loss.
train_cfg (dict) – Configs for training scheduler. disc_step must be contained for indicates the discriminator updating steps in each training step.
test_cfg (dict) – Configs for testing scheduler.
init_cfg (dict, optional) – Initialization config dict.

forward_train_d(data_batch, is_real, is_disc)[source]¶

Forward function in discriminator training step.

In this function, we modify the default implementation with only one discriminator. In DeepFillv1 model, they use two separated discriminators for global and local consistency.

Parameters

data_batch (torch.Tensor) – Batch of real data or fake data.
is_real (bool) – If True, the gan loss will regard this batch as real data. Otherwise, the gan loss will regard this batch as fake data.
is_disc (bool) – If True, this function is called in discriminator training step. Otherwise, this function is called in generator training step. This will help us to compute different types of adversarial loss, like LSGAN.

Returns

Contains the loss items computed in this function.

Return type

dict

two_stage_loss(stage1_data, stage2_data, gt, mask, masked_img)[source]¶

Calculate two-stage loss.

Parameters

stage1_data (dict) – Contain stage1 results.
stage2_data (dict) – Contain stage2 results.
gt (torch.Tensor) – Ground-truth image.
mask (torch.Tensor) – Mask image.
masked_img (torch.Tensor) – Composition of mask image and ground-truth image.

Returns

Dict contains the results computed within this function for visualization and dict contains the loss items computed in this function.

Return type

tuple(dict)

calculate_loss_with_type(loss_type, fake_res, fake_img, gt, mask, prefix='stage1_', fake_local=None)[source]¶

Calculate multiple types of losses.

Parameters

loss_type (str) – Type of the loss.
fake_res (torch.Tensor) – Direct results from model.
fake_img (torch.Tensor) – Composited results from model.
gt (torch.Tensor) – Ground-truth tensor.
mask (torch.Tensor) – Mask tensor.
prefix (str, optional) – Prefix for loss name. Defaults to ‘stage1_’. # noqa
fake_local (torch.Tensor, optional) – Local results from model. Defaults to None.

Returns

Contain loss value with its name.

Return type

dict

train_step(data: List[dict], optim_wrapper)[source]¶

Train step function.

In this function, the inpaintor will finish the train step following the pipeline:

get fake res/image

optimize discriminator (if have)

optimize generator

If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing generator after disc_step iterations for discriminator.

Parameters

data (List[dict]) – Batch of data as input.
optim_wrapper (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).

Returns

Dict with loss, information for logger, the number of samples and results for visualization.

Return type

dict

mmagic.models.editors.deepfillv1¶

Package Contents¶

Classes¶

`mmagic.models.editors.deepfillv1`¶