mmagic.models.editors.gca
¶
Package Contents¶
Classes¶
Guided Contextual Attention image matting model. |
|
Guided Contextual Attention Module. |
|
ResNet decoder with shortcut connection and gca module. |
|
ResNet decoder for image matting. |
|
ResNet decoder for image matting with shortcut connection. |
|
ResNet backbone with shortcut connection and gca module. |
|
ResNet encoder for image matting. |
|
ResNet backbone for image matting with shortcut connection. |
- class mmagic.models.editors.gca.GCA(data_preprocessor, backbone, loss_alpha=None, init_cfg: Optional[dict] = None, train_cfg=None, test_cfg=None)[源代码]¶
Bases:
mmagic.models.base_models.BaseMattor
Guided Contextual Attention image matting model.
https://arxiv.org/abs/2001.04069
- 参数
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.backbone (dict) – Config of backbone.
loss_alpha (dict) – Config of the alpha prediction loss. Default: None.
init_cfg (dict, optional) – Initialization config dict. Default: None.
train_cfg (dict) – Config of training. In
train_cfg
,train_backbone
should be specified. If the model has a refiner,train_refiner
should be specified.test_cfg (dict) – Config of testing. In
test_cfg
, If the model has a refiner,train_refiner
should be specified.
- _forward(inputs)¶
Forward function.
- 参数
inputs (torch.Tensor) – Input tensor.
- 返回
Output tensor.
- 返回类型
Tensor
- _forward_test(inputs)¶
Forward function for testing GCA model.
- 参数
inputs (torch.Tensor) – batch input tensor.
- 返回
Output tensor of model.
- 返回类型
Tensor
- _forward_train(inputs, data_samples)¶
Forward function for training GCA model.
- 参数
inputs (torch.Tensor) – batch input tensor collated by
data_preprocessor
.data_samples (List[BaseDataElement]) – data samples collated by
data_preprocessor
.
- 返回
Contains the loss items and batch information.
- 返回类型
dict
- class mmagic.models.editors.gca.GCAModule(in_channels, out_channels, kernel_size=3, stride=1, rate=2, pad_args=dict(mode='reflect'), interpolation='nearest', penalty=- 10000.0, eps=0.0001)[源代码]¶
Bases:
torch.nn.Module
Guided Contextual Attention Module.
From https://arxiv.org/pdf/2001.04069.pdf. Based on https://github.com/nbei/Deep-Flow-Guided-Video-Inpainting. This module use image feature map to augment the alpha feature map with guided contextual attention score.
Image feature and alpha feature are unfolded to small patches and later used as conv kernel. Thus, we refer the unfolding size as kernel size. Image feature patches have a default kernel size 3 while the kernel size of alpha feature patches could be specified by rate (see rate below). The image feature patches are used to convolve with the image feature itself to calculate the contextual attention. Then the attention feature map is convolved by alpha feature patches to obtain the attention alpha feature. At last, the attention alpha feature is added to the input alpha feature.
- 参数
in_channels (int) – Input channels of the guided contextual attention module.
out_channels (int) – Output channels of the guided contextual attention module.
kernel_size (int) – Kernel size of image feature patches. Default 3.
stride (int) – Stride when unfolding the image feature. Default 1.
rate (int) – The downsample rate of image feature map. The corresponding kernel size and stride of alpha feature patches will be rate x 2 and rate. It could be regarded as the granularity of the gca module. Default: 2.
pad_args (dict) – Parameters of padding when convolve image feature with image feature patches or alpha feature patches. Allowed keys are mode and value. See torch.nn.functional.pad() for more information. Default: dict(mode=’reflect’).
interpolation (str) – Interpolation method in upsampling and downsampling.
penalty (float) – Punishment hyperparameter to avoid a large correlation between each unknown patch and itself. Default: -1e4.
eps (float) – A small number to avoid dividing by 0 when calculating the normed image feature patch. Default: 1e-4.
- init_weights()¶
Init weights for the model.
- forward(img_feat, alpha_feat, unknown=None, softmax_scale=1.0)¶
Forward function of GCAModule.
- 参数
img_feat (Tensor) – Image feature map of shape (N, ori_c, ori_h, ori_w).
alpha_feat (Tensor) – Alpha feature map of shape (N, alpha_c, ori_h, ori_w).
unknown (Tensor, optional) – Unknown area map generated by trimap. If specified, this tensor should have shape (N, 1, ori_h, ori_w).
softmax_scale (float, optional) – The softmax scale of the attention if unknown area is not provided in forward. Default: 1.
- 返回
The augmented alpha feature.
- 返回类型
Tensor
- extract_feature_maps_patches(img_feat, alpha_feat, unknown)¶
Extract image feature, alpha feature unknown patches.
- 参数
img_feat (Tensor) – Image feature map of shape (N, img_c, img_h, img_w).
alpha_feat (Tensor) – Alpha feature map of shape (N, alpha_c, ori_h, ori_w).
unknown (Tensor, optional) – Unknown area map generated by trimap of shape (N, 1, img_h, img_w).
- 返回
3-tuple of
Tensor
: Image feature patches of shape (N, img_h*img_w, img_c, img_ks, img_ks).Tensor
: Guided contextual attention alpha feature map. (N, img_h*img_w, alpha_c, alpha_ks, alpha_ks).Tensor
: Unknown mask of shape (N, img_h*img_w, 1, 1).- 返回类型
tuple
- compute_similarity_map(img_feat, img_ps)¶
Compute similarity between image feature patches.
- 参数
img_feat (Tensor) – Image feature map of shape (1, img_c, img_h, img_w).
img_ps (Tensor) – Image feature patches tensor of shape (1, img_h*img_w, img_c, img_ks, img_ks).
- 返回
Similarity map between image feature patches with shape (1, img_h*img_w, img_h, img_w).
- 返回类型
Tensor
- compute_guided_attention_score(similarity_map, unknown_ps, scale, self_mask)¶
Compute guided attention score.
- 参数
similarity_map (Tensor) – Similarity map of image feature with shape (1, img_h*img_w, img_h, img_w).
unknown_ps (Tensor) – Unknown area patches tensor of shape (1, img_h*img_w, 1, 1).
scale (Tensor) – Softmax scale of known and unknown area: [unknown_scale, known_scale].
self_mask (Tensor) – Self correlation mask of shape (1, img_h*img_w, img_h, img_w). At (1, i*i, i, i) mask value equals -1e4 for i in [1, img_h*img_w] and other area is all zero.
- 返回
Similarity map between image feature patches with shape (1, img_h*img_w, img_h, img_w).
- 返回类型
Tensor
- propagate_alpha_feature(gca_score, alpha_ps)¶
Propagate alpha feature based on guided attention score.
- 参数
gca_score (Tensor) – Guided attention score map of shape (1, img_h*img_w, img_h, img_w).
alpha_ps (Tensor) – Alpha feature patches tensor of shape (1, img_h*img_w, alpha_c, alpha_ks, alpha_ks).
- 返回
Propagated alpha feature map of shape (1, alpha_c, alpha_h, alpha_w).
- 返回类型
Tensor
- process_unknown_mask(unknown, img_feat, softmax_scale)¶
Process unknown mask.
- 参数
unknown (Tensor, optional) – Unknown area map generated by trimap of shape (N, 1, ori_h, ori_w)
img_feat (Tensor) – The interpolated image feature map of shape (N, img_c, img_h, img_w).
softmax_scale (float, optional) – The softmax scale of the attention if unknown area is not provided in forward. Default: 1.
- 返回
2-tuple of
Tensor
: Interpolated unknown area map of shape (N, img_h*img_w, img_h, img_w).Tensor
: Softmax scale tensor of known and unknown area of shape (N, 2).- 返回类型
tuple
- extract_patches(x, kernel_size, stride)¶
Extract feature patches.
The feature map will be padded automatically to make sure the number of patches is equal to (H / stride) * (W / stride).
- 参数
x (Tensor) – Feature map of shape (N, C, H, W).
kernel_size (int) – Size of each patches.
stride (int) – Stride between patches.
- 返回
Extracted patches of shape (N, (H / stride) * (W / stride) , C, kernel_size, kernel_size).
- 返回类型
Tensor
- pad(x, kernel_size, stride)¶
Pad input tensor.
- 参数
x (Tensor) – Input tensor.
kernel_size (int) – Kernel size of conv layer.
stride (int) – Stride of conv layer.
- 返回
Padded tensor
- 返回类型
Tensor
- get_self_correlation_mask(img_feat)¶
Create self correlation mask.
- 参数
img_feat (Tensor) – Input tensor.
- 返回
Mask tensor.
- 返回类型
Tensor
- static l2_norm(x)¶
L2 normalization function.
- 参数
x (Tensor) – Input tensor.
- 返回
L2 normalized output tensor.
- 返回类型
Tensor
- class mmagic.models.editors.gca.ResGCADecoder(block, layers, in_channels, kernel_size=3, conv_cfg=None, norm_cfg=dict(type='BN'), act_cfg=dict(type='LeakyReLU', negative_slope=0.2, inplace=True), with_spectral_norm=False, late_downsample=False)[源代码]¶
Bases:
ResShortcutDec
ResNet decoder with shortcut connection and gca module.
feat1 ---------------------------------------- conv2 --- out | feat2 ----------------------------------- conv1 | feat3 ------------------------------ layer4 | feat4, img_feat -- gca_module - layer3 | feat5 ------- layer2 | out --- layer1
gca module also requires unknown tensor generated by trimap which is ignored in the above graph.
- 参数
block (str) – Type of residual block. Currently only BasicBlockDec is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Channel number of input features.
kernel_size (int) – Kernel size of the conv layers in the decoder.
conv_cfg (dict) – Dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
with_spectral_norm (bool) – Whether use spectral norm. Default: False.
late_downsample (bool) – Whether to adopt late downsample strategy, Default: False.
- forward(inputs)¶
Forward function of resnet shortcut decoder.
- 参数
inputs (dict) –
Output dictionary of the ResGCAEncoder containing:
out (Tensor): Output of the ResGCAEncoder.
feat1 (Tensor): Shortcut connection from input image.
feat2 (Tensor): Shortcut connection from conv2 of ResGCAEncoder.
feat3 (Tensor): Shortcut connection from layer1 of ResGCAEncoder.
feat4 (Tensor): Shortcut connection from layer2 of ResGCAEncoder.
feat5 (Tensor): Shortcut connection from layer3 of ResGCAEncoder.
img_feat (Tensor): Image feature extracted by guidance head.
unknown (Tensor): Unknown tensor generated by trimap.
- 返回
Output tensor.
- 返回类型
Tensor
- class mmagic.models.editors.gca.ResNetDec(block, layers, in_channels, kernel_size=3, conv_cfg=None, norm_cfg=dict(type='BN'), act_cfg=dict(type='LeakyReLU', negative_slope=0.2, inplace=True), with_spectral_norm=False, late_downsample=False, init_cfg: Optional[dict] = None)[源代码]¶
Bases:
mmengine.model.BaseModule
ResNet decoder for image matting.
This class is adopted from https://github.com/Yaoyi-Li/GCA-Matting.
- 参数
block (str) – Type of residual block. Currently only BasicBlockDec is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Channel num of input features.
kernel_size (int) – Kernel size of the conv layers in the decoder.
conv_cfg (dict) – dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
with_spectral_norm (bool) – Whether use spectral norm after conv. Default: False.
late_downsample (bool) – Whether to adopt late downsample strategy, Default: False.
init_cfg (dict, optional) – Initialization config dict. Default: None.
- init_weights()¶
Init weights for the module.
- _make_layer(block, planes, num_blocks, conv_cfg, norm_cfg, act_cfg, with_spectral_norm)¶
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (N, C, H, W).
- 返回
Output tensor.
- 返回类型
Tensor
- class mmagic.models.editors.gca.ResShortcutDec(block, layers, in_channels, kernel_size=3, conv_cfg=None, norm_cfg=dict(type='BN'), act_cfg=dict(type='LeakyReLU', negative_slope=0.2, inplace=True), with_spectral_norm=False, late_downsample=False, init_cfg: Optional[dict] = None)[源代码]¶
Bases:
ResNetDec
ResNet decoder for image matting with shortcut connection.
feat1 --------------------------- conv2 --- out | feat2 ---------------------- conv1 | feat3 ----------------- layer4 | feat4 ------------ layer3 | feat5 ------- layer2 | out --- layer1
- 参数
block (str) – Type of residual block. Currently only BasicBlockDec is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Channel number of input features.
kernel_size (int) – Kernel size of the conv layers in the decoder.
conv_cfg (dict) – Dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
late_downsample (bool) – Whether to adopt late downsample strategy, Default: False.
- forward(inputs)¶
Forward function of resnet shortcut decoder.
- 参数
inputs (dict) –
Output dictionary of the ResNetEnc containing:
out (Tensor): Output of the ResNetEnc.
feat1 (Tensor): Shortcut connection from input image.
feat2 (Tensor): Shortcut connection from conv2 of ResNetEnc.
feat3 (Tensor): Shortcut connection from layer1 of ResNetEnc.
feat4 (Tensor): Shortcut connection from layer2 of ResNetEnc.
feat5 (Tensor): Shortcut connection from layer3 of ResNetEnc.
- 返回
Output tensor.
- 返回类型
Tensor
- class mmagic.models.editors.gca.ResGCAEncoder(block, layers, in_channels, conv_cfg=None, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU'), with_spectral_norm=False, late_downsample=False, order=('conv', 'act', 'norm'), init_cfg: Optional[dict] = None)[源代码]¶
Bases:
ResShortcutEnc
ResNet backbone with shortcut connection and gca module.
image ---------------- shortcut[0] -------------- feat1 | conv1-conv2 ---------- shortcut[1] -------------- feat2 | conv3-layer1 ---- shortcut[2] -------------- feat3 | | image - guidance_conv ------------ img_feat | | layer2 --- gca_module - shortcut[4] - feat4 | layer3 -- shortcut[5] - feat5 | layer4 --------------- out
gca module also requires unknown tensor generated by trimap which is ignored in the above graph.
Implementation of Natural Image Matting via Guided Contextual Attention https://arxiv.org/pdf/2001.04069.pdf.
- 参数
block (str) – Type of residual block. Currently only BasicBlock is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Number of input channels.
conv_cfg (dict) – Dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
late_downsample (bool) – Whether to adopt late downsample strategy. Default: False.
order (tuple[str]) – Order of conv, norm and act layer in shortcut convolution module. Default: (‘conv’, ‘act’, ‘norm’).
init_cfg (dict, optional) – Initialization config dict. Default: None.
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (N, C, H, W).
- 返回
Contains the output tensor, shortcut feature and intermediate feature.
- 返回类型
dict
- class mmagic.models.editors.gca.ResNetEnc(block, layers, in_channels, conv_cfg=None, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU'), with_spectral_norm=False, late_downsample=False, init_cfg: Optional[dict] = None)[源代码]¶
Bases:
mmengine.model.BaseModule
ResNet encoder for image matting.
This class is adopted from https://github.com/Yaoyi-Li/GCA-Matting. Implement and pre-train on ImageNet with the tricks from https://arxiv.org/abs/1812.01187 without the mix-up part.
- 参数
block (str) – Type of residual block. Currently only BasicBlock is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Number of input channels.
conv_cfg (dict) – dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
with_spectral_norm (bool) – Whether use spectral norm after conv. Default: False.
late_downsample (bool) – Whether to adopt late downsample strategy, Default: False.
init_cfg (dict, optional) – Initialization config dict. Default: None.
- init_weights()¶
Init weights for the module.
- _make_layer(block, planes, num_blocks, stride, conv_cfg, norm_cfg, act_cfg, with_spectral_norm)¶
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (N, C, H, W).
- 返回
Output tensor.
- 返回类型
Tensor
- class mmagic.models.editors.gca.ResShortcutEnc(block, layers, in_channels, conv_cfg=None, norm_cfg=dict(type='BN'), act_cfg=dict(type='ReLU'), with_spectral_norm=False, late_downsample=False, order=('conv', 'act', 'norm'), init_cfg: Optional[dict] = None)[源代码]¶
Bases:
ResNetEnc
ResNet backbone for image matting with shortcut connection.
image ---------------- shortcut[0] --- feat1 | conv1-conv2 ---------- shortcut[1] --- feat2 | conv3-layer1 --- shortcut[2] --- feat3 | layer2 -- shortcut[4] --- feat4 | layer3 - shortcut[5] --- feat5 | layer4 ---------------- out
Baseline model of Natural Image Matting via Guided Contextual Attention https://arxiv.org/pdf/2001.04069.pdf.
- 参数
block (str) – Type of residual block. Currently only BasicBlock is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Number of input channels.
conv_cfg (dict) – Dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
with_spectral_norm (bool) – Whether use spectral norm after conv. Default: False.
late_downsample (bool) – Whether to adopt late downsample strategy. Default: False.
order (tuple[str]) – Order of conv, norm and act layer in shortcut convolution module. Default: (‘conv’, ‘act’, ‘norm’).
init_cfg (dict, optional) – Initialization config dict. Default: None.
- _make_shortcut(in_channels, out_channels, conv_cfg, norm_cfg, act_cfg, order, with_spectral_norm)¶
- forward(x)¶
Forward function.
- 参数
x (Tensor) – Input tensor with shape (N, C, H, W).
- 返回
Contains the output tensor and shortcut feature.
- 返回类型
dict