`mmagic.models.archs`¶

Package Contents¶

Classes¶

`AllGatherLayer`	All gather layer with backward propagation path.
`ASPP`	ASPP module from DeepLabV3.
`AttentionInjection`	Wrapper for stable diffusion unet.
`SpatialTemporalEnsemble`	Apply spatial and temporal ensemble and compute outputs.
`SimpleGatedConvModule`	Simple Gated Convolutional Module.
`ImgNormalize`	Normalize images with the given mean and std value.
`LinearModule`	A linear block that contains linear/norm/activation layers.
`LoRAWrapper`	Wrapper for LoRA layer.
`MultiLayerDiscriminator`	Multilayer Discriminator.
`PatchDiscriminator`	A PatchGAN discriminator.
`ResNet`	General ResNet.
`DepthwiseSeparableConvModule`	Depthwise separable convolution module.
`SimpleEncoderDecoder`	Simple encoder-decoder model from matting.
`SoftMaskPatchDiscriminator`	A Soft Mask-Guided PatchGAN discriminator.
`ResidualBlockNoBN`	Residual block without BN.
`TokenizerWrapper`	Tokenizer wrapper for CLIPTokenizer. Only support CLIPTokenizer
`PixelShufflePack`	Pixel Shuffle upsample layer.
`VGG16`	Customized VGG16 Encoder.

Functions¶

`pixel_unshuffle`(→ torch.Tensor)	Down-sample by pixel unshuffle.
`set_lora`(→ torch.nn.Module)	Set LoRA for module.
`set_lora_disable`(→ torch.nn.Module)	Disable LoRA modules.
`set_lora_enable`(→ torch.nn.Module)	Enable LoRA modules.
`set_only_lora_trainable`(→ torch.nn.Module)	Set only LoRA modules trainable.

class mmagic.models.archs.AllGatherLayer(*args, **kwargs)[source]¶

Bases: torch.autograd.Function

All gather layer with backward propagation path.

Indeed, this module is to make dist.all_gather() in the backward graph. Such kind of operation has been widely used in Moco and other contrastive learning algorithms.

static forward(ctx, x)[source]¶: Forward function.

static backward(ctx, *grad_outputs)[source]¶: Backward function.

class mmagic.models.archs.ASPP(in_channels: int, out_channels: int = 256, mid_channels: int = 256, dilations: Sequence[int] = (12, 24, 36), conv_cfg: Optional[dict] = None, norm_cfg: Optional[dict] = dict(type='BN'), act_cfg: Optional[dict] = dict(type='ReLU'), separable_conv: bool = False)[source]¶

Bases: torch.nn.Module

ASPP module from DeepLabV3.

The code is adopted from https://github.com/pytorch/vision/blob/master/torchvision/models/ segmentation/deeplabv3.py

For more information about the module: “Rethinking Atrous Convolution for Semantic Image Segmentation”.

Parameters

in_channels (int) – Input channels of the module.
out_channels (int) – Output channels of the module. Default: 256.
mid_channels (int) – Output channels of the intermediate ASPP conv modules. Default: 256.
dilations (Sequence[int]) – Dilation rate of three ASPP conv module. Default: [12, 24, 36].
conv_cfg (dict) – Config dict for convolution layer. If “None”, nn.Conv2d will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
separable_conv (bool) – Whether replace normal conv with depthwise separable conv which is faster. Default: False.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward function for ASPP module.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Output tensor.
Return type: Tensor

class mmagic.models.archs.AttentionInjection(module: torch.nn.Module, injection_weight=5)[source]¶

Bases: torch.nn.Module

Wrapper for stable diffusion unet.

Parameters: module (nn.Module) – The module to be wrapped.

forward(x: torch.Tensor, t, encoder_hidden_states=None, down_block_additional_residuals=None, mid_block_additional_residual=None, ref_x=None) → torch.Tensor[source]¶

Forward and add LoRA mapping.

Parameters: x (Tensor) – The input tensor.
Returns: The output tensor.
Return type: Tensor

mmagic.models.archs.pixel_unshuffle(x: torch.Tensor, scale: int) → torch.Tensor[source]¶

Down-sample by pixel unshuffle.

Parameters

x (Tensor) – Input tensor.
scale (int) – Scale factor.

Returns

Output tensor.

Return type

Tensor

class mmagic.models.archs.SpatialTemporalEnsemble(is_temporal_ensemble: Optional[bool] = False)[source]¶

Bases: torch.nn.Module

Apply spatial and temporal ensemble and compute outputs.

Parameters: is_temporal_ensemble (bool, optional) – Whether to apply ensemble temporally. If True, the sequence will also be flipped temporally. If the input is an image, this argument must be set to False. Default: False.

_transform(imgs: torch.Tensor, mode: str) → torch.Tensor[source]¶

Apply spatial transform (flip, rotate) to the images.

Parameters

imgs (torch.Tensor) – The images to be transformed/
mode (str) – The mode of transform. Supported values are ‘vertical’, ‘horizontal’, and ‘transpose’, corresponding to vertical flip, horizontal flip, and rotation, respectively.

Returns

Output of the model with spatial ensemble applied.

Return type

torch.Tensor

spatial_ensemble(imgs: torch.Tensor, model: torch.nn.Module) → torch.Tensor[source]¶

Apply spatial ensemble.

Parameters

imgs (torch.Tensor) – The images to be processed by the model. Its size should be either (n, t, c, h, w) or (n, c, h, w).
model (nn.Module) – The model to process the images.

Returns

Output of the model with spatial ensemble applied.

Return type

torch.Tensor

forward(imgs: torch.Tensor, model: torch.nn.Module) → torch.Tensor[source]¶

Apply spatial and temporal ensemble.

Parameters

imgs (torch.Tensor) – The images to be processed by the model. Its size should be either (n, t, c, h, w) or (n, c, h, w).
model (nn.Module) – The model to process the images.

Returns

Output of the model with spatial ensemble applied.

Return type

torch.Tensor

class mmagic.models.archs.SimpleGatedConvModule(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], feat_act_cfg: Optional[dict] = dict(type='ELU'), gate_act_cfg: Optional[dict] = dict(type='Sigmoid'), **kwargs)[source]¶

Bases: torch.nn.Module

Simple Gated Convolutional Module.

This module is a simple gated convolutional module. The detailed formula is:

\[y = \phi(conv1(x)) * \sigma(conv2(x)),\]

where phi is the feature activation function and sigma is the gate activation function. In default, the gate activation function is sigmoid.

Parameters

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – The number of channels of the output feature. Note that out_channels in the conv module is doubled since this module contains two convolutions for feature and gate separately.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
feat_act_cfg (dict) – Config dict for feature activation layer. Default: dict(type=’ELU’).
gate_act_cfg (dict) – Config dict for gate activation layer. Default: dict(type=’Sigmoid’).
kwargs (keyword arguments) – Same as ConvModule.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward Function.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Output tensor with shape of (n, c, h’, w’).
Return type: torch.Tensor

class mmagic.models.archs.ImgNormalize(pixel_range: float, img_mean: Tuple[float, float, float], img_std: Tuple[float, float, float], sign: int = - 1)[source]¶

Bases: torch.nn.Conv2d

Normalize images with the given mean and std value.

Based on Conv2d layer, can work in GPU.

Parameters

pixel_range (float) – Pixel range of feature.
img_mean (Tuple[float]) – Image mean of each channel.
img_std (Tuple[float]) – Image std of each channel.
sign (int) – Sign of bias. Default -1.

class mmagic.models.archs.LinearModule(in_features: int, out_features: int, bias: bool = True, act_cfg: Optional[dict] = dict(type='ReLU'), inplace: bool = True, with_spectral_norm: bool = False, order: Tuple[str, str] = ('linear', 'act'))[source]¶

Bases: torch.nn.Module

A linear block that contains linear/norm/activation layers.

For low level vision, we add spectral norm and padding layer.

Parameters

in_features (int) – Same as nn.Linear.
out_features (int) – Same as nn.Linear.
bias (bool) – Same as nn.Linear. Default: True.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
inplace (bool) – Whether to use inplace mode for activation. Default: True.
with_spectral_norm (bool) – Whether use spectral norm in linear module. Default: False.
order (tuple[str]) – The order of linear/activation layers. It is a sequence of “linear”, “norm” and “act”. Examples are (“linear”, “act”) and (“act”, “linear”).

init_weights() → None[source]¶: Init weights for the model.

forward(x: torch.Tensor, activate: Optional[bool] = True) → torch.Tensor[source]¶

Forward Function.

Parameters

x (torch.Tensor) – Input tensor with shape of \((n, *, c)\). Same as torch.nn.Linear.
activate (bool, optional) – Whether to use activation layer. Defaults to True.

Returns

Same as torch.nn.Linear.

Return type

torch.Tensor

class mmagic.models.archs.LoRAWrapper(module: torch.nn.Module, in_feat: int, out_feat: int, rank: int, scale: float = 1, names: Optional[Union[str, List[str]]] = None)[source]¶

Bases: torch.nn.Module

Wrapper for LoRA layer.

Parameters

module (nn.Module) – The module to be wrapped.
in_feat (int) – Number of input features.
out_feat (int) – Number of output features.
rank (int) – The rank of LoRA.
scale (float) – The scale of LoRA feature.
names (Union[str, List[str]], optional) – The name of LoRA layers. If you want to add multi LoRA for one module, names for each LoRA mapping must be defined.

add_lora(name: str, rank: int, scale: float = 1, state_dict: Optional[dict] = None)[source]¶

Add LoRA mapping.

Parameters

name (str) – The name of added LoRA.
rank (int) – The rank of added LoRA.
scale (float, optional) – The scale of added LoRA. Defaults to 1.
state_dict (dict, optional) – The state dict of added LoRA. Defaults to None.

_set_value(attr_name: str, value: Any, name: Optional[str] = None)[source]¶

Set value of attribute.

Parameters

attr_name (str) – The name of attribute to be set value.
value (Any) – The value to be set.
name (str, optional) – The name of field in attr_name. If passed, will set value to attr_name[name]. Defaults to None.

set_scale(scale: float, name: Optional[str] = None)[source]¶

Set LoRA scale.

Parameters

scale (float) – The scale to be set.
name (str, optional) – The name of LoRA to be set. Defaults to None.

set_enable(name: Optional[str] = None)[source]¶

Enable LoRA for the current layer.

Parameters: name (str, optional) – The name of LoRA to be set. Defaults to None.

set_disable(name: Optional[str] = None)[source]¶

Disable LoRA for the current layer.

Parameters: name (str, optional) – The name of LoRA to be set. Defaults to None.

forward_lora_mapping(x: torch.Tensor) → torch.Tensor[source]¶

Forward LoRA mapping.

Parameters: x (Tensor) – The input tensor.
Returns: The output tensor.
Return type: Tensor

forward(x: torch.Tensor, *args, **kwargs) → torch.Tensor[source]¶

Forward and add LoRA mapping.

Parameters: x (Tensor) – The input tensor.
Returns: The output tensor.
Return type: Tensor

classmethod wrap_lora(module, rank=4, scale=1, names=None, state_dict=None)[source]¶

Wrap LoRA.

Use case: >>> linear = nn.Linear(2, 4) >>> lora_linear = LoRAWrapper.wrap_lora(linear, 4, 1)

Parameters

module (nn.Module) – The module to add LoRA.
rank (int) – The rank for LoRA.
scale (float) –

Return type

LoRAWrapper

mmagic.models.archs.set_lora(module: torch.nn.Module, config: dict, verbose: bool = True) → torch.nn.Module[source]¶

Set LoRA for module.

Use case: >>> 1. set all lora with same parameters >>> lora_config = dict( >>> rank=4, >>> scale=1, >>> target_modules=[‘to_q’, ‘to_k’, ‘to_v’])

>>> 2. set lora with different parameters
>>> lora_config = dict(
>>>     rank=4,
>>>     scale=1,
>>>     target_modules=[
>>>         # set `to_q` the default parameters
>>>         'to_q',
>>>         # set `to_k` the defined parameters
>>>         dict(target_module='to_k', rank=8, scale=1),
>>>         # set `to_v` the defined `rank` and default `scale`
>>>         dict(target_module='to_v', rank=16)
>>>     ])

Parameters

module (nn.Module) – The module to set LoRA.
config (dict) – The config dict.
verbose (bool) – Whether to print log. Defaults to True.

mmagic.models.archs.set_lora_disable(module: torch.nn.Module) → torch.nn.Module[source]¶: Disable LoRA modules.

mmagic.models.archs.set_lora_enable(module: torch.nn.Module) → torch.nn.Module[source]¶: Enable LoRA modules.

mmagic.models.archs.set_only_lora_trainable(module: torch.nn.Module) → torch.nn.Module[source]¶: Set only LoRA modules trainable.

class mmagic.models.archs.MultiLayerDiscriminator(in_channels: int, max_channels: int, num_convs: int = 5, fc_in_channels: Optional[int] = None, fc_out_channels: int = 1024, kernel_size: int = 5, conv_cfg: Optional[dict] = None, norm_cfg: Optional[dict] = None, act_cfg: Optional[dict] = dict(type='ReLU'), out_act_cfg: Optional[dict] = dict(type='ReLU'), with_input_norm: bool = True, with_out_convs: bool = False, with_spectral_norm: bool = False, **kwargs)[source]¶

Bases: torch.nn.Module

Multilayer Discriminator.

This is a commonly used structure with stacked multiply convolution layers.

Parameters

in_channels (int) – Input channel of the first input convolution.
max_channels (int) – The maximum channel number in this structure.
num_conv (int) – Number of stacked intermediate convs (including input conv but excluding output conv). Default to 5.
fc_in_channels (int | None) – Input dimension of the fully connected layer. If fc_in_channels is None, the fully connected layer will be removed. Default to None.
fc_out_channels (int) – Output dimension of the fully connected layer. Default to 1024.
kernel_size (int) – Kernel size of the conv modules. Default to 5.
conv_cfg (dict) – Config dict to build conv layer.
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
out_act_cfg (dict) – Config dict for output activation, “relu” by default.
with_input_norm (bool) – Whether add normalization after the input conv. Default to True.
with_out_convs (bool) – Whether add output convs to the discriminator. The output convs contain two convs. The first out conv has the same setting as the intermediate convs but a stride of 1 instead of 2. The second out conv is a conv similar to the first out conv but reduces the number of channels to 1 and has no activation layer. Default to False.
with_spectral_norm (bool) – Whether use spectral norm after the conv layers. Default to False.
kwargs (keyword arguments) –

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward Function.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Output tensor with shape of (n, c, h’, w’) or (n, c).
Return type: torch.Tensor

init_weights(pretrained: Optional[str] = None) → None[source]¶

Init weights for models.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

class mmagic.models.archs.PatchDiscriminator(in_channels: int, base_channels: int = 64, num_conv: int = 3, norm_cfg: dict = dict(type='BN'), init_cfg: Optional[dict] = dict(type='normal', gain=0.02))[source]¶

Bases: mmengine.model.BaseModule

A PatchGAN discriminator.

Parameters

in_channels (int) – Number of channels in input images.
base_channels (int) – Number of channels at the first conv layer. Default: 64.
num_conv (int) – Number of stacked intermediate convs (excluding input and output conv). Default: 3.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’BN’).
init_cfg (dict) – Config dict for initialization. type: The name of our initialization method. Default: ‘normal’. gain: Scaling factor for normal, xavier and orthogonal. Default: 0.02.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights() → None[source]¶

Initialize weights for the model.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Default: None.

class mmagic.models.archs.ResNet(depth: int, in_channels: int = 3, stem_channels: int = 64, base_channels: int = 64, num_stages: int = 4, strides: Sequence[int] = (1, 2, 2, 2), dilations: Sequence[int] = (1, 1, 2, 4), deep_stem: bool = False, avg_down: bool = False, frozen_stages: int = - 1, act_cfg: dict = dict(type='ReLU'), conv_cfg: Optional[dict] = None, norm_cfg: dict = dict(type='BN'), with_cp: bool = False, multi_grid: Optional[Sequence[int]] = None, contract_dilation: bool = False, zero_init_residual: bool = True)[source]¶

Bases: torch.nn.Module

General ResNet.

This class is adopted from https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/models/backbones/resnet.py.

Parameters

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default” 3.
stem_channels (int) – Number of stem channels. Default: 64.
base_channels (int) – Number of base channels of res layer. Default: 64.
num_stages (int) – Resnet stages, normally 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 2, 4).
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
act_cfg (dict) – Dictionary to construct and config activation layer. Default: dict(type=’ReLU’).
conv_cfg (dict) – Dictionary to construct and config convolution layer. Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
multi_grid (Sequence[int]|None) – Multi grid dilation rates of last stage. Default: None.
contract_dilation (bool) – Whether contract first dilation of each layer Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

property norm1: torch.nn.Module¶

normalization layer after the second convolution layer

Type: nn.Module

arch_settings¶

_make_stem_layer(in_channels: int, stem_channels: int) → None[source]¶: Make stem layer for ResNet.

_make_layer(block: BasicBlock, planes: int, blocks: int, stride: int = 1, dilation: int = 1) → torch.nn.Module[source]¶

_nostride_dilate(m: torch.nn.Module, dilate: int) → None[source]¶

init_weights(pretrained: Optional[str] = None) → None[source]¶

Init weights for the model.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

_freeze_stages() → None[source]¶: Freeze stages param and norm stats.

forward(x: torch.Tensor) → List[torch.Tensor][source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Output tensor.
Return type: Tensor

class mmagic.models.archs.DepthwiseSeparableConvModule(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, norm_cfg: Optional[dict] = None, act_cfg: Optional[dict] = dict(type='ReLU'), dw_norm_cfg: Union[dict, str] = 'default', dw_act_cfg: Union[dict, str] = 'default', pw_norm_cfg: Union[dict, str] = 'default', pw_act_cfg: Union[dict, str] = 'default', **kwargs)[source]¶

Bases: torch.nn.Module

Depthwise separable convolution module.

See https://arxiv.org/pdf/1704.04861.pdf for details.

This module can replace a ConvModule with the conv block replaced by two conv block: depthwise conv block and pointwise conv block. The depthwise conv block contains depthwise-conv/norm/activation layers. The pointwise conv block contains pointwise-conv/norm/activation layers. It should be noted that there will be norm/activation layer in the depthwise conv block if norm_cfg and act_cfg are specified.

Parameters

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int or tuple[int]) – Same as nn.Conv2d. Default: 1.
padding (int or tuple[int]) – Same as nn.Conv2d. Default: 0.
dilation (int or tuple[int]) – Same as nn.Conv2d. Default: 1.
norm_cfg (dict) – Default norm config for both depthwise ConvModule and pointwise ConvModule. Default: None.
act_cfg (dict) – Default activation config for both depthwise ConvModule and pointwise ConvModule. Default: dict(type=’ReLU’).
dw_norm_cfg (dict) – Norm config of depthwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.
pw_norm_cfg (dict) – Norm config of pointwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
pw_act_cfg (dict) – Activation config of pointwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.
kwargs (optional) – Other shared arguments for depthwise and pointwise ConvModule. See ConvModule for ref.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Output tensor.
Return type: Tensor

class mmagic.models.archs.SimpleEncoderDecoder(encoder: dict, decoder: dict, init_cfg: Optional[dict] = None)[source]¶

Bases: mmengine.model.BaseModule

Simple encoder-decoder model from matting.

Parameters

encoder (dict) – Config of the encoder.
decoder (dict) – Config of the decoder.
init_cfg (dict, optional) – Initialization config dict.

forward(*args, **kwargs) → torch.Tensor[source]¶

Forward function.

Returns: The output tensor of the decoder.
Return type: Tensor

class mmagic.models.archs.SoftMaskPatchDiscriminator(in_channels: int, base_channels: Optional[int] = 64, num_conv: Optional[int] = 3, norm_cfg: Optional[dict] = None, init_cfg: Optional[dict] = dict(type='normal', gain=0.02), with_spectral_norm: Optional[bool] = False)[source]¶

Bases: mmengine.model.BaseModule

A Soft Mask-Guided PatchGAN discriminator.

Parameters

in_channels (int) – Number of channels in input images.
base_channels (int, optional) – Number of channels at the first conv layer. Default: 64.
num_conv (int, optional) – Number of stacked intermediate convs (excluding input and output conv). Default: 3.
norm_cfg (dict, optional) – Config dict to build norm layer. Default: None.
init_cfg (dict, optional) – Config dict for initialization. type: The name of our initialization method. Default: ‘normal’. gain: Scaling factor for normal, xavier and orthogonal. Default: 0.02.
with_spectral_norm (bool, optional) – Whether use spectral norm after the conv layers. Default: False.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights() → None[source]¶: Initialize weights for the model.

class mmagic.models.archs.ResidualBlockNoBN(mid_channels: int = 64, res_scale: float = 1.0)[source]¶

Bases: torch.nn.Module

Residual block without BN.

It has a style of:

---Conv-ReLU-Conv-+-
 |________________|

Parameters

mid_channels (int) – Channel number of intermediate features. Default: 64.
res_scale (float) – Used to scale the residual before addition. Default: 1.0.

init_weights() → None[source]¶

Initialize weights for ResidualBlockNoBN.

Initialization methods like kaiming_init are for VGG-style modules. For modules with residual paths, using smaller std is better for stability and performance. We empirically use 0.1. See more details in “ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks”

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

class mmagic.models.archs.TokenizerWrapper(from_pretrained: Optional[Union[str, os.PathLike]] = None, from_config: Optional[Union[str, os.PathLike]] = None, *args, **kwargs)[source]¶

Tokenizer wrapper for CLIPTokenizer. Only support CLIPTokenizer currently. This wrapper is modified from https://github.com/huggingface/dif fusers/blob/e51f19aee82c8dd874b715a09dbc521d88835d68/src/diffusers/loaders. py#L358 # noqa.

Parameters

from_pretrained (Union[str, os.PathLike], optional) – The model id of a pretrained model or a path to a directory containing model weights and config. Defaults to None.
from_config (Union[str, os.PathLike], optional) – The model id of a pretrained model or a path to a directory containing model weights and config. Defaults to None.
*args – If from_pretrained is passed, *args and **kwargs will be passed to from_pretrained function. Otherwise, *args and **kwargs will be used to initialize the model by self._module_cls(*args, **kwargs).
**kwargs –
If from_pretrained is passed, *args and **kwargs will be passed to from_pretrained function. Otherwise, *args and **kwargs will be used to initialize the model by self._module_cls(*args, **kwargs).

__getattr__(name: str) → Any[source]¶

try_adding_tokens(tokens: Union[str, List[str]], *args, **kwargs)[source]¶

Attempt to add tokens to the tokenizer.

Parameters: tokens (Union[str, List[str]]) – The tokens to be added.

get_token_info(token: str) → dict[source]¶

Get the information of a token, including its start and end index in the current tokenizer.

Parameters

token (str) – The token to be queried.

Returns

The information of the token, including its start and end: index in current tokenizer.

Return type

dict

add_placeholder_token(placeholder_token: str, *args, num_vec_per_token: int = 1, **kwargs)[source]¶

Add placeholder tokens to the tokenizer.

Parameters

placeholder_token (str) – The placeholder token to be added.
num_vec_per_token (int, optional) – The number of vectors of the added placeholder token.
*args – The arguments for self.wrapped.add_tokens.
**kwargs –
The arguments for self.wrapped.add_tokens.

replace_placeholder_tokens_in_text(text: Union[str, List[str]], vector_shuffle: bool = False, prop_tokens_to_load: float = 1.0) → Union[str, List[str]][source]¶

Replace the keywords in text with placeholder tokens. This function will be called in self.__call__ and self.encode.

Parameters

text (Union[str, List[str]]) – The text to be processed.
vector_shuffle (bool, optional) – Whether to shuffle the vectors. Defaults to False.
prop_tokens_to_load (float, optional) – The proportion of tokens to be loaded. If 1.0, all tokens will be loaded. Defaults to 1.0.

Returns

The processed text.

Return type

Union[str, List[str]]

replace_text_with_placeholder_tokens(text: Union[str, List[str]]) → Union[str, List[str]][source]¶

Replace the placeholder tokens in text with the original keywords. This function will be called in self.decode.

Parameters: text (Union[str, List[str]]) – The text to be processed.
Returns: The processed text.
Return type: Union[str, List[str]]

__call__(text: Union[str, List[str]], *args, vector_shuffle: bool = False, prop_tokens_to_load: float = 1.0, **kwargs)[source]¶

The call function of the wrapper.

Parameters

text (Union[str, List[str]]) – The text to be tokenized.
vector_shuffle (bool, optional) – Whether to shuffle the vectors. Defaults to False.
prop_tokens_to_load (float, optional) – The proportion of tokens to be loaded. If 1.0, all tokens will be loaded. Defaults to 1.0
*args – The arguments for self.wrapped.__call__.
**kwargs –
The arguments for self.wrapped.__call__.

encode(text: Union[str, List[str]], *args, **kwargs)[source]¶

Encode the passed text to token index.

Parameters

text (Union[str, List[str]]) – The text to be encode.
*args – The arguments for self.wrapped.__call__.
**kwargs –
The arguments for self.wrapped.__call__.

decode(token_ids, return_raw: bool = False, *args, **kwargs) → Union[str, List[str]][source]¶

Decode the token index to text.

Parameters

token_ids – The token index to be decoded.
return_raw – Whether keep the placeholder token in the text. Defaults to False.
*args – The arguments for self.wrapped.decode.
**kwargs –
The arguments for self.wrapped.decode.

Returns

The decoded text.

Return type

Union[str, List[str]]

__repr__()[source]¶: The representation of the wrapper.

class mmagic.models.archs.PixelShufflePack(in_channels: int, out_channels: int, scale_factor: int, upsample_kernel: int)[source]¶

Bases: torch.nn.Module

Pixel Shuffle upsample layer.

Parameters

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
scale_factor (int) – Upsample ratio.
upsample_kernel (int) – Kernel size of Conv layer to expand channels.

Returns

Upsampled feature map.

init_weights() → None[source]¶: Initialize weights for PixelShufflePack.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward function for PixelShufflePack.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

class mmagic.models.archs.VGG16(in_channels: int, batch_norm: Optional[bool] = False, aspp: Optional[bool] = False, dilations: Optional[List[int]] = None, init_cfg: Optional[dict] = None)[source]¶

Bases: mmengine.model.BaseModule

Customized VGG16 Encoder.

A 1x1 conv is added after the original VGG16 conv layers. The indices of max pooling layers are returned for unpooling layers in decoders.

Parameters

in_channels (int) – Number of input channels.
batch_norm (bool, optional) – Whether use nn.BatchNorm2d. Default to False.
aspp (bool, optional) – Whether use ASPP module after the last conv layer. Default to False.
dilations (list[int], optional) – Atrous rates of ASPP module. Default to None.
init_cfg (dict, optional) – Initialization config dict.

_make_layer(inplanes: int, planes: int, convs_layers: int) → torch.nn.Module[source]¶

init_weights() → None[source]¶: Init weights for the model.

forward(x: torch.Tensor) → Dict[str, torch.Tensor][source]¶

Forward function for ASPP module.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Dict containing output tensor and maxpooling indices.
Return type: dict

mmagic.models.archs¶

Package Contents¶

Classes¶

Functions¶

`mmagic.models.archs`¶