`mmagic.models.utils`¶

Package Contents¶

Functions¶

`extract_around_bbox`(img, bbox, target_size[, ...])	Extract patches around the given bbox.
`extract_bbox_patch`(bbox, img[, channel_first])	Extract patch from a given bbox.
`flow_warp`(x, flow[, interpolation, padding_mode, ...])	Warp an image or a feature map with optical flow.
`build_module`(→ Any)	Build module from config or return the module itself.
`default_init_weights`(module[, scale])	Initialize network weights.
`generation_init_weights`(module[, init_type, init_gain])	Default initialization of network weights for image generation.
`get_module_device`(module)	Get the device of a module.
`get_valid_noise_size`(→ Optional[int])	Get the value of noise_size from input, generator and check the
`get_valid_num_batches`(→ int)	Try get the valid batch size from inputs.
`make_layer`(block, num_blocks, **kwarg)	Make layers by stacking the same blocks.
`remove_tomesd`(model)	Removes a patch from a ToMe Diffusion module if it was already patched.
`set_requires_grad`(nets[, requires_grad])	Set requires_grad for all the networks.
`set_tomesd`(model[, ratio, max_downsample, sx, sy, ...])	Patches a stable diffusion model with ToMe. Apply this to the highest
`set_xformers`(→ torch.nn.Module)	Set xformers' efficient Attention for attention modules.
`xformers_is_enable`(→ bool)	Check whether xformers is installed.
`label_sample_fn`(→ Union[torch.Tensor, None])	Sample random label with respect to num_batches, num_classes and
`noise_sample_fn`(→ torch.Tensor)	Sample noise with respect to the given num_batches, noise_size and
`get_unknown_tensor`(trimap[, unknown_value])	Get 1-channel unknown area tensor from the 3 or 1-channel trimap tensor.
`normalize_vecs`(→ torch.Tensor)	Normalize vector with it's lengths at the last dimension. If vector is

mmagic.models.utils.extract_around_bbox(img, bbox, target_size, channel_first=True)[source]¶

Extract patches around the given bbox.

Parameters

img (torch.Tensor | numpy.array) – Image data to be extracted. If organized in batch dimension, the batch dimension must be the first order like (n, h, w, c) or (n, c, h, w).
bbox (np.ndarray | torch.Tensor) – Bboxes to be modified. Bbox can be in batch or not.
target_size (List(int)) – Target size of final bbox.
channel_first (bool) – If True, the channel dimension of img is before height and width, e.g. (c, h, w). Otherwise, the img shape (samples in the batch) is like (h, w, c). Default: True.

Returns

Extracted patches. The dimension of the output should be the same as img.

Return type

(torch.Tensor | np.ndarray)

mmagic.models.utils.extract_bbox_patch(bbox, img, channel_first=True)[source]¶

Extract patch from a given bbox.

Parameters

bbox (torch.Tensor | numpy.array) – Bbox with (top, left, h, w). If img has batch dimension, the bbox must be stacked at first dimension. The shape should be (4,) or (n, 4).
img (torch.Tensor | numpy.array) – Image data to be extracted. If organized in batch dimension, the batch dimension must be the first order like (n, h, w, c) or (n, c, h, w).
channel_first (bool) – If True, the channel dimension of img is before height and width, e.g. (c, h, w). Otherwise, the img shape (samples in the batch) is like (h, w, c). Default: True.

Returns

Extracted patches. The dimension of the output should be the same as img.

Return type

(torch.Tensor | numpy.array)

mmagic.models.utils.flow_warp(x, flow, interpolation='bilinear', padding_mode='zeros', align_corners=True)[source]¶

Warp an image or a feature map with optical flow.

Parameters

x (Tensor) – Tensor with size (n, c, h, w).
flow (Tensor) – Tensor with size (n, h, w, 2). The last dimension is a two-channel, denoting the width and height relative offsets. Note that the values are not normalized to [-1, 1].
interpolation (str) – Interpolation mode: ‘nearest’ or ‘bilinear’. Default: ‘bilinear’.
padding_mode (str) – Padding mode: ‘zeros’ or ‘border’ or ‘reflection’. Default: ‘zeros’.
align_corners (bool) – Whether align corners. Default: True.

Returns

Warped image or feature map.

Return type

Tensor

mmagic.models.utils.build_module(module: Union[dict, torch.nn.Module], builder: mmengine.registry.Registry, *args, **kwargs) → Any[source]¶

Build module from config or return the module itself.

Parameters

module (Union[dict, nn.Module]) – The module to build.
builder (Registry) – The registry to build module.
*args – Arguments passed to build function.
**kwargs –
Arguments passed to build function.

Returns

The built module.

Return type

Any

mmagic.models.utils.default_init_weights(module, scale=1)[source]¶

Initialize network weights.

Parameters

modules (nn.Module) – Modules to be initialized.
scale (float) – Scale initialized weights, especially for residual blocks. Default: 1.

mmagic.models.utils.generation_init_weights(module, init_type='normal', init_gain=0.02)[source]¶

Default initialization of network weights for image generation.

By default, we use normal init, but xavier and kaiming might work better for some applications.

Parameters

module (nn.Module) – Module to be initialized.
init_type (str) – The name of an initialization method: normal | xavier | kaiming | orthogonal. Default: ‘normal’.
init_gain (float) – Scaling factor for normal, xavier and orthogonal. Default: 0.02.

mmagic.models.utils.get_module_device(module)[source]¶

Get the device of a module.

Parameters: module (nn.Module) – A module contains the parameters.
Returns: The device of the module.
Return type: torch.device

mmagic.models.utils.get_valid_noise_size(noise_size: Optional[int], generator: Union[Dict, torch.nn.Module]) → Optional[int][source]¶

Get the value of noise_size from input, generator and check the consistency of these values. If no conflict is found, return that value.

Parameters

noise_size (Optional[int]) – noise_size passed to BaseGAN_refactor’s initialize function.
generator (ModelType) – The config or the model of generator.

Returns

The noise size feed to generator.

Return type

int | None

mmagic.models.utils.get_valid_num_batches(batch_inputs: Optional[mmagic.utils.typing.ForwardInputs] = None, data_samples: List[mmagic.structures.DataSample] = None) → int[source]¶

Try get the valid batch size from inputs.

If some values in batch_inputs are Tensor and ‘num_batches’ is in batch_inputs, we check whether the value of ‘num_batches’ and the the length of first dimension of all tensors are same. If the values are not same, AssertionError will be raised. If all values are the same, return the value.
If no values in batch_inputs is Tensor, ‘num_batches’ must be contained in batch_inputs. And this value will be returned.
If some values are Tensor and ‘num_batches’ is not contained in batch_inputs, we check whether all tensor have the same length on the first dimension. If the length are not same, AssertionError will be raised. If all length are the same, return the length as batch size.
If batch_inputs is a Tensor, directly return the length of the first dimension as batch size.

Parameters: batch_inputs (ForwardInputs) – Inputs passed to forward().
Returns: The batch size of samples to generate.
Return type: int

mmagic.models.utils.make_layer(block, num_blocks, **kwarg)[source]¶

Make layers by stacking the same blocks.

Parameters

block (nn.module) – nn.module class for basic block.
num_blocks (int) – number of blocks.

Returns

Stacked blocks in nn.Sequential.

Return type

nn.Sequential

mmagic.models.utils.remove_tomesd(model: torch.nn.Module)[source]¶

Removes a patch from a ToMe Diffusion module if it was already patched.

Refer to: https://github.com/dbolya/tomesd/blob/main/tomesd/patch.py#L251 # noqa

mmagic.models.utils.set_requires_grad(nets, requires_grad=False)[source]¶

Set requires_grad for all the networks.

Parameters

nets (nn.Module | list[nn.Module]) – A list of networks or a single network.
requires_grad (bool) – Whether the networks require gradients or not

mmagic.models.utils.set_tomesd(model: torch.nn.Module, ratio: float = 0.5, max_downsample: int = 1, sx: int = 2, sy: int = 2, use_rand: bool = True, merge_attn: bool = True, merge_crossattn: bool = False, merge_mlp: bool = False)[source]¶

Patches a stable diffusion model with ToMe. Apply this to the highest level stable diffusion object.

Refer to: https://github.com/dbolya/tomesd/blob/main/tomesd/patch.py#L173 # noqa

Parameters

model (torch.nn.Module) – A top level Stable Diffusion module to patch in place.
ratio (float) – The ratio of tokens to merge. I.e., 0.4 would reduce the total number of tokens by 40%.The maximum value for this is 1-(1/(sx * sy)). By default, the max ratio is 0.75 (usually <= 0.5 is recommended). Higher values result in more speed-up, but with more visual quality loss.
max_downsample (int) – Apply ToMe to layers with at most this amount of downsampling. E.g., 1 only applies to layers with no downsampling, while 8 applies to all layers. Should be chosen from [1, 2, 4, or 8]. 1 and 2 are recommended.
sx (int, int) – The stride for computing dst sets. A higher stride means you can merge more tokens, default setting of (2, 2) works well in most cases. sx and sy do not need to divide image size.
sy (int, int) – The stride for computing dst sets. A higher stride means you can merge more tokens, default setting of (2, 2) works well in most cases. sx and sy do not need to divide image size.
use_rand (bool) – Whether or not to allow random perturbations when computing dst sets. By default: True, but if you’re having weird artifacts you can try turning this off.
merge_attn (bool) – Whether or not to merge tokens for attention (recommended).
merge_crossattn (bool) – Whether or not to merge tokens for cross attention (not recommended).
merge_mlp (bool) – Whether or not to merge tokens for the mlp layers (particular not recommended).

Returns

Model patched by ToMe.

Return type

model (torch.nn.Module)

mmagic.models.utils.set_xformers(module: torch.nn.Module, prefix: str = '') → torch.nn.Module[source]¶

Set xformers’ efficient Attention for attention modules.

Parameters

module (nn.Module) – The module to set xformers.
prefix (str) – The prefix of the module name.

Returns

The module with xformers’ efficient Attention.

Return type

nn.Module

mmagic.models.utils.xformers_is_enable(verbose: bool = False) → bool[source]¶

Check whether xformers is installed. :param verbose: Whether to print the log. :type verbose: bool

Returns: Whether xformers is installed.
Return type: bool

mmagic.models.utils.label_sample_fn(label: Union[torch.Tensor, Callable, List[int], None] = None, *, num_batches: int = 1, num_classes: Optional[int] = None, device: Optional[str] = None) → Union[torch.Tensor, None][source]¶

Sample random label with respect to num_batches, num_classes and device.

Parameters

label (Union[Tensor, Callable, List[int], None], optional) – You can directly give a batch of label through a torch.Tensor or offer a callable function to sample a batch of label data. Otherwise, the None indicates to use the default label sampler. Defaults to None.
num_batches (int, optional) – The number of batch size. Defaults to 1.
num_classes (Optional[int], optional) – The number of classes. Defaults to None.
device (Optional[str], optional) – The target device of the label. Defaults to None.

Returns

Sampled random label.

Return type

Union[Tensor, None]

mmagic.models.utils.noise_sample_fn(noise: Union[torch.Tensor, Callable, None] = None, *, num_batches: int = 1, noise_size: Union[int, Sequence[int], None] = None, device: Optional[str] = None) → torch.Tensor[source]¶

Sample noise with respect to the given num_batches, noise_size and device.

Parameters

noise (torch.Tensor | callable | None) – You can directly give a batch of noise through a torch.Tensor or offer a callable function to sample a batch of noise data. Otherwise, the None indicates to use the default noise sampler. Defaults to None.
num_batches (int, optional) – The number of batch size. Defaults to 1.
noise_size (Union[int, Sequence[int], None], optional) – The size of random noise. Defaults to None.
device (Optional[str], optional) – The target device of the random noise. Defaults to None.

Returns

Sampled random noise.

Return type

Tensor

mmagic.models.utils.get_unknown_tensor(trimap, unknown_value=128 / 255)[source]¶

Get 1-channel unknown area tensor from the 3 or 1-channel trimap tensor.

Parameters

trimap (Tensor) – Tensor with shape (N, 3, H, W) or (N, 1, H, W).
unknown_value (float) – Scalar value indicating unknown region in trimap. If trimap is pre-processed using ‘rescale_to_zero_one’, then 0 for bg, 128/255 for unknown, 1 for fg, and unknown_value should set to 128 / 255. If trimap is pre-processed by FormatTrimap(to_onehot=False)(), then 0 for bg, 1 for unknown, 2 for fg and unknown_value should set to 1. If trimap is pre-processed by FormatTrimap(to_onehot=True)(), then trimap is 3-channeled, and this value is not used.

Returns

Unknown area mask of shape (N, 1, H, W).

Return type

Tensor

mmagic.models.utils.normalize_vecs(vectors: torch.Tensor) → torch.Tensor[source]¶

Normalize vector with it’s lengths at the last dimension. If vector is two-dimension tensor, this function is same as L2 normalization.

Parameters: vector (torch.Tensor) – Vectors to be normalized.
Returns: Vectors after normalization.
Return type: torch.Tensor

mmagic.models.utils¶

Package Contents¶

Functions¶

`mmagic.models.utils`¶