`mmagic.datasets.transforms`¶

Package Contents¶

Classes¶

`AlbuCorruptFunction`	AlbuCorruptFunction augmentation.
`PairedAlbuTransForms`	PairedAlbuTransForms augmentation.
`Albumentations`	Albumentation augmentation.
`GenerateSeg`	Generate segmentation mask from alpha matte.
`GenerateSoftSeg`	Generate soft segmentation mask from input segmentation mask.
`MirrorSequence`	Extend short sequences (e.g. Vimeo-90K) by mirroring the sequences.
`TemporalReverse`	Reverse frame lists for temporal augmentation.
`BinarizeImage`	Binarize image.
`Clip`	Clip the pixels.
`ColorJitter`	An interface for torch color jitter so that it can be invoked in mmagic
`RandomAffine`	Apply random affine to input images.
`RandomMaskDilation`	Randomly dilate binary masks.
`UnsharpMasking`	Apply unsharp masking to an image or a sequence of images.
`Flip`	Flip the input data with a probability.
`NumpyPad`	Numpy Padding.
`RandomRotation`	Rotate the image by a randomly-chosen angle, measured in degree.
`RandomTransposeHW`	Randomly transpose images in H and W dimensions with a probability.
`Resize`	Resize data to a specific size for training or resize the images to fit
`CenterCropLongEdge`	Center crop the given image by the long edge.
`Crop`	Crop data to specific size for training.
`CropAroundCenter`	Randomly crop the images around unknown area in the center 1/4 images.
`CropAroundFg`	Crop around the whole foreground in the segmentation mask.
`CropAroundUnknown`	Crop around unknown area with a randomly selected scale.
`CropLike`	Crop/pad the image in the target_key according to the size of image in
`FixedCrop`	Crop paired data (at a specific position) to specific size for training.
`InstanceCrop`	Use maskrcnn to detect instances on image.
`ModCrop`	Mod crop images, used during testing.
`PairedRandomCrop`	Paired random crop.
`RandomCropLongEdge`	Random crop the given image by the long edge.
`RandomResizedCrop`	Crop data to random size and aspect ratio.
`CompositeFg`	Composite foreground with a random foreground.
`MergeFgAndBg`	Composite foreground image and background image with alpha.
`PerturbBg`	Randomly add gaussian noise or gamma change to background image.
`RandomJitter`	Randomly jitter the foreground in hsv space.
`RandomLoadResizeBg`	Randomly load a background image and resize it.
`PackInputs`	Pack data into DataSample for training, evaluation and testing.
`GenerateCoordinateAndCell`	Generate coordinate and cell. Generate coordinate from the desired size
`GenerateFacialHeatmap`	Generate heatmap from keypoint.
`GenerateFrameIndices`	Generate frame index for REDS datasets. It also performs temporal
`GenerateFrameIndiceswithPadding`	Generate frame index with padding for REDS dataset and Vid4 dataset
`GenerateSegmentIndices`	Generate frame indices for a segment. It also performs temporal
`GetMaskedImage`	Get masked image.
`GetSpatialDiscountMask`	Get spatial discounting mask constant.
`LoadImageFromFile`	Load a single image or image frames from corresponding paths. Required
`LoadMask`	Load Mask for multiple types.
`LoadPairedImageFromFile`	Load a pair of images from file.
`MATLABLikeResize`	Resize the input image using MATLAB-like downsampling.
`Normalize`	Normalize images with the given mean and std value.
`RescaleToZeroOne`	Transform the images into a range between 0 and 1.
`DegradationsWithShuffle`	Apply random degradations to input, with degradations being shuffled.
`RandomBlur`	Apply random blur to the input.
`RandomJPEGCompression`	Apply random JPEG compression to the input.
`RandomNoise`	Apply random noise to the input.
`RandomResize`	Randomly resize the input.
`RandomVideoCompression`	Apply random video compression to the input.
`RandomDownSampling`	Generate LQ image from GT (and crop), which will randomly pick a scale.
`FormatTrimap`	Convert trimap (tensor) to one-hot representation.
`GenerateTrimap`	Using random erode/dilate to generate trimap from alpha matte.
`GenerateTrimapWithDistTransform`	Generate trimap with distance transform function.
`TransformTrimap`	Transform trimap into two-channel and six-channel.
`CopyValues`	Copy the value of source keys to destination keys.
`SetValues`	Set value to destination keys.

class mmagic.datasets.transforms.AlbuCorruptFunction(keys: List[str], config: List[dict], p: float = 1.0)[source]¶

Bases: mmcv.transforms.BaseTransform

AlbuCorruptFunction augmentation.

Apply the same AlbuCorruptFunction augmentation to the input images.

transform(results)[source]¶

processing input results according to self.augs.

Parameters

results (dict) – contains the processed data
pipeline. (through the transform) –

Returns

the processed data.

Return type

results

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.PairedAlbuTransForms(size: int, lq_key: str = 'img', gt_key: str = 'gt', scope: str = 'geometric', crop: str = 'random', p: float = 0.5)[source]¶

Bases: mmcv.transforms.BaseTransform

PairedAlbuTransForms augmentation.

Apply the same AlbuTransforms augmentation to paired images.

transform(results)[source]¶

processing input results according to self.pipeline.

Parameters

results (dict) – contains the processed data
pipeline. (through the transform) –

Returns

the processed data.

Return type

results

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.Albumentations(keys: List[str], transforms: List[dict])[source]¶

Bases: mmcv.transforms.BaseTransform

Albumentation augmentation.

Adds custom transformations from Albumentations library. Please, visit https://github.com/albumentations-team/albumentations and https://albumentations.ai/docs/getting_started/transforms_and_targets to get more information.

An example of transforms is as followed:

albu_transforms = [
    dict(
        type='Resize',
        height=100,
        width=100,
    ),
    dict(
        type='RandomFog',
        p=0.5,
    ),
    dict(
        type='RandomRain',
        p=0.5
    ),
    dict(
        type='RandomSnow',
        p=0.5,
    ),
]
pipeline = [
    dict(
        type='LoadImageFromFile',
        key='img',
        color_type='color',
        channel_order='rgb',
        imdecode_backend='cv2'),
    dict(
        type='Albumentations',
        keys=['img'],
        transforms=albu_transforms),
    dict(type='PackInputs')
]

Parameters

keys (list[str]) – A list specifying the keys whose values are modified.
transforms (list[dict]) – A list of albu transformations.

albu_builder(cfg: dict) → albumentations[source]¶

Import a module from albumentations.

It inherits some of build_from_cfg() logic.

Parameters: cfg (dict) – Config dict. It should at least contain the key “type”.
Returns: The constructed object.
Return type: obj

_apply_albu(imgs)[source]¶

transform(results)[source]¶: Transform function of Albumentations.

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.GenerateSeg(kernel_size=5, erode_iter_range=(10, 20), dilate_iter_range=(15, 30), num_holes_range=(0, 3), hole_sizes=[(15, 15), (25, 25), (35, 35), (45, 45)], blur_ksizes=[(21, 21), (31, 31), (41, 41)])[source]¶

Bases: mmcv.transforms.BaseTransform

Generate segmentation mask from alpha matte.

Parameters

kernel_size (int, optional) – Kernel size for both erosion and dilation. The kernel will have the same height and width. Defaults to 5.
erode_iter_range (tuple, optional) – Iteration of erosion. Defaults to (10, 20).
dilate_iter_range (tuple, optional) – Iteration of dilation. Defaults to (15, 30).
num_holes_range (tuple, optional) – Range of number of holes to randomly select from. Defaults to (0, 3).
hole_sizes (list, optional) – List of (h, w) to be selected as the size of the rectangle hole. Defaults to [(15, 15), (25, 25), (35, 35), (45, 45)].
blur_ksizes (list, optional) – List of (h, w) to be selected as the kernel_size of the gaussian blur. Defaults to [(21, 21), (31, 31), (41, 41)].

static _crop_hole(img, start_point, hole_size)[source]¶

Create a all-zero rectangle hole in the image.

Parameters

img (np.ndarray) – Source image.
start_point (tuple[int]) – The top-left point of the rectangle.
hole_size (tuple[int]) – The height and width of the rectangle hole.

Returns

The cropped image.

Return type

np.ndarray

transform(results: dict) → dict[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.GenerateSoftSeg(fg_thr=0.2, border_width=25, erode_ksize=3, dilate_ksize=5, erode_iter_range=(10, 20), dilate_iter_range=(3, 7), blur_ksizes=[(21, 21), (31, 31), (41, 41)])[source]¶

Bases: mmcv.transforms.BaseTransform

Generate soft segmentation mask from input segmentation mask.

Required key is “seg”, added key is “soft_seg”.

Parameters

fg_thr (float, optional) – Threshold of the foreground in the normalized input segmentation mask. Defaults to 0.2.
border_width (int, optional) – Width of border to be padded to the bottom of the mask. Defaults to 25.
erode_ksize (int, optional) – Fixed kernel size of the erosion. Defaults to 5.
dilate_ksize (int, optional) – Fixed kernel size of the dilation. Defaults to 5.
erode_iter_range (tuple, optional) – Iteration of erosion. Defaults to (10, 20).
dilate_iter_range (tuple, optional) – Iteration of dilation. Defaults to (3, 7).
blur_ksizes (list, optional) – List of (h, w) to be selected as the kernel_size of the gaussian blur. Defaults to [(21, 21), (31, 31), (41, 41)].

transform(results: dict) → dict[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.MirrorSequence(keys)[source]¶

Bases: mmcv.transforms.BaseTransform

Extend short sequences (e.g. Vimeo-90K) by mirroring the sequences.

Given a sequence with N frames (x1, …, xN), extend the sequence to (x1, …, xN, xN, …, x1).

Required Keys:

[KEYS]

Modified Keys:

[KEYS]

Parameters: keys (list[str]) – The frame lists to be extended.

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.TemporalReverse(keys, reverse_ratio=0.5)[source]¶

Bases: mmcv.transforms.BaseTransform

Reverse frame lists for temporal augmentation.

Required keys are the keys in attributes “lq” and “gt”, added or modified keys are “lq”, “gt” and “reverse”.

Parameters

keys (list[str]) – The frame lists to be reversed.
reverse_ratio (float) – The probability to reverse the frame lists. Default: 0.5.

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.BinarizeImage(keys, binary_thr, a_min=0, a_max=1, dtype=np.uint8)[source]¶

Bases: mmcv.transforms.BaseTransform

Binarize image.

Parameters

keys (Sequence[str]) – The images to be binarized.
binary_thr (float) – Threshold for binarization.
a_min (int) – Lower limits of pixel value.
a_max (int) – Upper limits of pixel value.
dtype (np.dtype) – Set the data type of the output. Default: np.uint8

_binarize(img)[source]¶

Binarize image.

Parameters: img (np.ndarray) – Input image.
Returns: Output image.
Return type: img (np.ndarray)

transform(results)[source]¶

The transform function of BinarizeImage.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.Clip(keys, a_min=0, a_max=255)[source]¶

Bases: mmcv.transforms.BaseTransform

Clip the pixels.

Modified keys are the attributes specified in “keys”.

Parameters

keys (list[str]) – The keys whose values are clipped.
a_min (int) – Lower limits of pixel value.
a_max (int) – Upper limits of pixel value.

_clip(input_)[source]¶

Clip the pixels.

Parameters: input (Union[List, np.ndarray]) – Pixels to clip.
Returns: Clipped pixels.
Return type: Union[List, np.ndarray]

transform(results)[source]¶

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation.

Returns

A dict with the values of the specified keys are rounded: and clipped.

Return type

dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.ColorJitter(keys, channel_order='rgb', **kwargs)[source]¶

Bases: mmcv.transforms.BaseTransform

An interface for torch color jitter so that it can be invoked in mmagic pipeline.

Randomly change the brightness, contrast and saturation of an image. Modified keys are the attributes specified in “keys”.

Required Keys:

[KEYS]

Modified Keys:

[KEYS]

Parameters

keys (list[str]) – The images to be resized.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘rgb’.

Notes

**kwards follows the args list of torchvision.transforms.ColorJitter.

brightness (float or tuple of float (min, max)): How much to jitter: brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.
contrast (float or tuple of float (min, max)): How much to jitter: contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.
saturation (float or tuple of float (min, max)): How much to jitter: saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.
hue (float or tuple of float (min, max)): How much to jitter hue.: hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.

_color_jitter(image, this_seed)[source]¶

Color Jitter Function.

Parameters

image (np.ndarray) – Image.
this_seed (int) – Seed of torch.

Returns

The output image.

Return type

image (np.ndarray)

transform(results: Dict) → Dict[source]¶

The transform function of ColorJitter.

Parameters: results (dict) – The result dict.
Returns: The result dict.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.RandomAffine(keys, degrees, translate=None, scale=None, shear=None, flip_ratio=None)[source]¶

Bases: mmcv.transforms.BaseTransform

Apply random affine to input images.

This class is adopted from https://github.com/pytorch/vision/blob/v0.5.0/torchvision/transforms/ transforms.py#L1015 It should be noted that in https://github.com/Yaoyi-Li/GCA-Matting/blob/master/dataloader/ data_generator.py#L70 random flip is added. See explanation of flip_ratio below. Required keys are the keys in attribute “keys”, modified keys are keys in attribute “keys”.

Parameters

keys (Sequence[str]) – The images to be affined.
degrees (float | tuple[float]) – Range of degrees to select from. If it is a float instead of a tuple like (min, max), the range of degrees will be (-degrees, +degrees). Set to 0 to deactivate rotations.
translate (tuple, optional) – Tuple of maximum absolute fraction for horizontal and vertical translations. For example translate=(a, b), then horizontal shift is randomly sampled in the range -img_width * a < dx < img_width * a and vertical shift is randomly sampled in the range -img_height * b < dy < img_height * b. Default: None.
scale (tuple, optional) – Scaling factor interval, e.g (a, b), then scale is randomly sampled from the range a <= scale <= b. Default: None.
shear (float | tuple[float], optional) – Range of shear degrees to select from. If shear is a float, a shear parallel to the x axis and a shear parallel to the y axis in the range (-shear, +shear) will be applied. Else if shear is a tuple of 2 values, a x-axis shear and a y-axis shear in (shear[0], shear[1]) will be applied. Default: None.
flip_ratio (float, optional) – Probability of the image being flipped. The flips in horizontal direction and vertical direction are independent. The image may be flipped in both directions. Default: None.

static _get_params(degrees, translate, scale_ranges, shears, flip_ratio, img_size)[source]¶

Get parameters for affine transformation.

Returns: Params to be passed to the affine transformation.
Return type: paras (tuple)

static _get_inverse_affine_matrix(center, angle, translate, scale, shear, flip)[source]¶

Helper method to compute inverse matrix for affine transformation.

As it is explained in PIL.Image.rotate, we need compute INVERSE of affine transformation matrix: M = T * C * RSS * C^-1 where T is translation matrix:

[1, 0, tx | 0, 1, ty | 0, 0, 1];

C is translation matrix to keep center:: [1, 0, cx | 0, 1, cy | 0, 0, 1];

RSS is rotation with scale and shear matrix.

It is different from the original function in torchvision. 1. The order are changed to flip -> scale -> rotation -> shear. 2. x and y have different scale factors. RSS(shear, a, scale, f) =

[ cos(a + shear)*scale_x*f -sin(a + shear)*scale_y 0] [ sin(a)*scale_x*f cos(a)*scale_y 0] [ 0 0 1]

Thus, the inverse is M^-1 = C * RSS^-1 * C^-1 * T^-1.

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.RandomMaskDilation(keys, binary_thr=0.0, kernel_min=9, kernel_max=49)[source]¶

Bases: mmcv.transforms.BaseTransform

Randomly dilate binary masks.

Parameters

keys (Sequence[str]) – The images to be resized.
binary_thr (float) – Threshold for obtaining binary mask. Default: 0.
kernel_min (int) – Min size of dilation kernel. Default: 9.
kernel_max (int) – Max size of dilation kernel. Default: 49.

_random_dilate(img)[source]¶

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.UnsharpMasking(kernel_size, sigma, weight, threshold, keys)[source]¶

Bases: mmcv.transforms.BaseTransform

Apply unsharp masking to an image or a sequence of images.

Parameters

kernel_size (int) – The kernel_size of the Gaussian kernel.
sigma (float) – The standard deviation of the Gaussian.
weight (float) – The weight of the “details” in the final output.
threshold (float) – Pixel differences larger than this value are regarded as “details”.
keys (list[str]) – The keys whose values are processed.

Added keys are “xxx_unsharp”, where “xxx” are the attributes specified in “keys”.

_unsharp_masking(imgs)[source]¶: Unsharp masking function.

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.Flip(keys, flip_ratio=0.5, direction='horizontal')[source]¶

Bases: mmcv.transforms.BaseTransform

Flip the input data with a probability.

Reverse the order of elements in the given data with a specific direction. The shape of the data is preserved, but the elements are reordered. Required keys are the keys in attributes “keys”, added or modified keys are “flip”, “flip_direction” and the keys in attributes “keys”. It also supports flipping a list of images with the same flip.

Required Keys:

[KEYS]

Modified Keys:

[KEYS]

Parameters

keys (Union[str, List[str]]) – The images to be flipped.
flip_ratio (float) – The probability to flip the images. Default: 0.5.
direction (str) – Flip images horizontally or vertically. Options are “horizontal” | “vertical”. Default: “horizontal”.

_directions = ['horizontal', 'vertical']¶

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.NumpyPad(keys, padding, **kwargs)[source]¶

Bases: mmcv.transforms.BaseTransform

Numpy Padding.

In this augmentation, numpy padding is adopted to customize padding augmentation. Please carefully read the numpy manual in: https://numpy.org/doc/stable/reference/generated/numpy.pad.html

If you just hope a single dimension to be padded, you must set padding like this:

padding = ((2, 2), (0, 0), (0, 0))

In this case, if you adopt an input with three dimension, only the first dimension will be padded.

Parameters

keys (Union[str, List[str]]) – The images to be padded.
padding (int | tuple(int)) – Please refer to the args pad_width in numpy.pad.

transform(results)[source]¶

Call function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__() → str[source]¶: Return repr(self).

class mmagic.datasets.transforms.RandomRotation(keys, degrees)[source]¶

Bases: mmcv.transforms.BaseTransform

Rotate the image by a randomly-chosen angle, measured in degree.

Parameters

keys (list[str]) – The images to be rotated.
degrees (tuple[float] | tuple[int] | float | int) – If it is a tuple, it represents a range (min, max). If it is a float or int, the range is constructed as (-degrees, degrees).

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.RandomTransposeHW(keys, transpose_ratio=0.5)[source]¶

Bases: mmcv.transforms.BaseTransform

Randomly transpose images in H and W dimensions with a probability.

(TransposeHW = horizontal flip + anti-clockwise rotation by 90 degrees) When used with horizontal/vertical flips, it serves as a way of rotation augmentation. It also supports randomly transposing a list of images.

Required keys are the keys in attributes “keys”, added or modified keys are “transpose” and the keys in attributes “keys”.

Parameters

keys (list[str]) – The images to be transposed.
transpose_ratio (float) – The probability to transpose the images. Default: 0.5.

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.Resize(keys: Union[str, List[str]] = 'img', scale=None, keep_ratio=False, size_factor=None, max_size=None, interpolation='bilinear', backend=None, output_keys=None)[source]¶

Bases: mmcv.transforms.BaseTransform

Resize data to a specific size for training or resize the images to fit the network input regulation for testing.

When used for resizing images to fit network input regulation, the case is that a network may have several downsample and then upsample operation, then the input height and width should be divisible by the downsample factor of the network. For example, the network would downsample the input for 5 times with stride 2, then the downsample factor is 2^5 = 32 and the height and width should be divisible by 32.

Required keys are the keys in attribute “keys”, added or modified keys are “keep_ratio”, “scale_factor”, “interpolation” and the keys in attribute “keys”.

Required Keys:

Required keys are the keys in attribute “keys”

Modified Keys:

Modified the keys in attribute “keys” or save as new key ([OUT_KEY])

Added Keys:

[OUT_KEY]_shape
keep_ratio
scale_factor
interpolation

All keys in “keys” should have the same shape. “test_trans” is used to record the test transformation to align the input’s shape.

Parameters

keys (str | list[str]) – The image(s) to be resized.
scale (float | tuple[int]) – If scale is tuple[int], target spatial size (h, w). Otherwise, target spatial size is scaled by input size. Note that when it is used, size_factor and max_size are useless. Default: None
keep_ratio (bool) – If set to True, images will be resized without changing the aspect ratio. Otherwise, it will resize images to a given size. Default: False. Note that it is used together with scale.
size_factor (int) – Let the output shape be a multiple of size_factor. Default:None. Note that when it is used, scale should be set to None and keep_ratio should be set to False.
max_size (int) – The maximum size of the longest side of the output. Default:None. Note that it is used together with size_factor.
interpolation (str) – Algorithm used for interpolation: “nearest” | “bilinear” | “bicubic” | “area” | “lanczos”. Default: “bilinear”.
backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.
output_keys (list[str] | None) – The resized images. Default: None Note that if it is not None, its length should be equal to keys.

_resize(img)[source]¶

Resize function.

Parameters: img (np.ndarray) – Image.
Returns: Resized image.
Return type: img (np.ndarray)

transform(results: Dict) → Dict[source]¶

Transform function to resize images.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.CenterCropLongEdge(keys='img')[source]¶

Bases: mmcv.transforms.BaseTransform

Center crop the given image by the long edge.

Parameters: keys (list[str]) – The images to be cropped.

transform(results)[source]¶

Call function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.Crop(keys, crop_size, random_crop=True, is_pad_zeros=False)[source]¶

Bases: mmcv.transforms.BaseTransform

Crop data to specific size for training.

Parameters

keys (Sequence[str]) – The images to be cropped.
crop_size (Tuple[int]) – Target spatial size (h, w).
random_crop (bool) – If set to True, it will random crop image. Otherwise, it will work as center crop. Default: True.
is_pad_zeros (bool, optional) – Whether to pad the image with 0 if crop_size is greater than image size. Default: False.

_crop(data)[source]¶

Crop the data.

Parameters: data (Union[List, np.ndarray]) – Input data to crop.
Returns: cropped data and corresponding crop box.
Return type: tuple

transform(results)[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.CropAroundCenter(crop_size)[source]¶

Bases: mmcv.transforms.BaseTransform

Randomly crop the images around unknown area in the center 1/4 images.

This cropping strategy is adopted in GCA matting. The unknown area is the same as semi-transparent area. https://arxiv.org/pdf/2001.04069.pdf

It retains the center 1/4 images and resizes the images to ‘crop_size’. Required keys are “fg”, “bg”, “trimap” and “alpha”, added or modified keys are “crop_bbox”, “fg”, “bg”, “trimap” and “alpha”.

Parameters: crop_size (int | tuple) – Desired output size. If int, square crop is applied.

transform(results)[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.CropAroundFg(keys, bd_ratio_range=(0.1, 0.4), test_mode=False)[source]¶

Bases: mmcv.transforms.BaseTransform

Crop around the whole foreground in the segmentation mask.

Required keys are “seg” and the keys in argument keys. Meanwhile, “seg” must be in argument keys. Added or modified keys are “crop_bbox” and the keys in argument keys.

Parameters

keys (Sequence[str]) – The images to be cropped. It must contain ‘seg’.
bd_ratio_range (tuple, optional) – The range of the boundary (bd) ratio to select from. The boundary ratio is the ratio of the boundary to the minimal bbox that contains the whole foreground given by segmentation. Default to (0.1, 0.4).
test_mode (bool) – Whether use test mode. In test mode, the tight crop area of foreground will be extended to the a square. Default to False.

transform(results)[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

class mmagic.datasets.transforms.CropAroundUnknown(keys, crop_sizes, unknown_source='alpha', interpolations='bilinear')[source]¶

Bases: mmcv.transforms.BaseTransform

Crop around unknown area with a randomly selected scale.

Randomly select the w and h from a list of (w, h). Required keys are the keys in argument keys, added or modified keys are “crop_bbox” and the keys in argument keys. This class assumes value of “alpha” ranges from 0 to 255.

Parameters

keys (Sequence[str]) – The images to be cropped. It must contain ‘alpha’. If unknown_source is set to ‘trimap’, then it must also contain ‘trimap’.
crop_sizes (list[int | tuple[int]]) – List of (w, h) to be selected.
unknown_source (str, optional) – Unknown area to select from. It must be ‘alpha’ or ‘trimap’. Default to ‘alpha’.
interpolations (str | list[str], optional) – Interpolation method of mmcv.imresize. The interpolation operation will be applied when image size is smaller than the crop_size. If given as a list of str, it should have the same length as keys. Or if given as a str all the keys will be resized with the same method. Default to ‘bilinear’.

transform(results)[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.CropLike(target_key, reference_key=None)[source]¶

Bases: mmcv.transforms.BaseTransform

Crop/pad the image in the target_key according to the size of image in the reference_key .

Parameters

target_key (str) – The key needs to be cropped.
reference_key (str | None) – The reference key, need its size. Default: None.

transform(results)[source]¶

Transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation. Require self.target_key and self.reference_key.

Returns

A dict containing the processed data and information.: Modify self.target_key.

Return type

dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.FixedCrop(keys, crop_size, crop_pos=None)[source]¶

Bases: mmcv.transforms.BaseTransform

Crop paired data (at a specific position) to specific size for training.

Parameters

keys (Sequence[str]) – The images to be cropped.
crop_size (Tuple[int]) – Target spatial size (h, w).
crop_pos (Tuple[int]) – Specific position (x, y). If set to None, random initialize the position to crop paired data batch. Default: None.

_crop(data, x_offset, y_offset, crop_w, crop_h)[source]¶

Crop the data.

Parameters

data (Union[List, np.ndarray]) – Input data to crop.
x_offset (int) – The offset of x axis.
y_offset (int) – The offset of y axis.
crop_w (int) – The width of crop bbox.
crop_h (int) – The height of crop bbox.

Returns

cropped data and corresponding crop box.

Return type

tuple

transform(results)[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.InstanceCrop(config_file, from_pretrained=None, key='img', box_num_upbound=- 1, finesize=256)[source]¶

Bases: mmcv.transforms.BaseTransform

Use maskrcnn to detect instances on image.

Mask R-CNN is used to detect the instance on the image pred_bbox is used to segment the instance on the image

Parameters

config_file (str) – config file name relative to detectron2’s “configs/”
key (str) – Unused
box_num_upbound (int) – The upper limit on the number of instances in the figure

transform(results: dict) → dict[source]¶

The transform function of InstanceCrop.

Parameters

results (dict) – A dict containing the necessary information and data for Conversion

Returns

A dict containing the processed data: and information.

Return type

results (dict)

predict_bbox(image)[source]¶

class mmagic.datasets.transforms.ModCrop(key='gt')[source]¶

Bases: mmcv.transforms.BaseTransform

Mod crop images, used during testing.

Required keys are “scale” and “KEY”, added or modified keys are “KEY”.

Parameters: key (str) – The key of image. Default: ‘gt’

transform(results)[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.PairedRandomCrop(gt_patch_size, lq_key='img', gt_key='gt')[source]¶

Bases: mmcv.transforms.BaseTransform

Paired random crop.

It crops a pair of img and gt images with corresponding locations. It also supports accepting img list and gt list. Required keys are “scale”, “lq_key”, and “gt_key”, added or modified keys are “lq_key” and “gt_key”.

Parameters

gt_patch_size (int) – cropped gt patch size.
lq_key (str) – Key of LQ img. Default: ‘img’.
gt_key (str) – Key of GT img. Default: ‘gt’.

transform(results)[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.RandomCropLongEdge(keys='img')[source]¶

Bases: mmcv.transforms.BaseTransform

Random crop the given image by the long edge.

Parameters: keys (list[str]) – The images to be cropped.

transform(results)[source]¶

Call function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.RandomResizedCrop(keys, crop_size, scale=(0.08, 1.0), ratio=(3.0 / 4.0, 4.0 / 3.0), interpolation='bilinear')[source]¶

Bases: mmcv.transforms.BaseTransform

Crop data to random size and aspect ratio.

A crop of a random proportion of the original image and a random aspect ratio of the original aspect ratio is made. The cropped image is finally resized to a given size specified by ‘crop_size’. Modified keys are the attributes specified in “keys”.

This code is partially adopted from torchvision.transforms.RandomResizedCrop: [https://pytorch.org/vision/stable/_modules/torchvision/transforms/ transforms.html#RandomResizedCrop].

Parameters

keys (list[str]) – The images to be resized and random-cropped.
crop_size (int | tuple[int]) – Target spatial size (h, w).
scale (tuple[float], optional) – Range of the proportion of the original image to be cropped. Default: (0.08, 1.0).
ratio (tuple[float], optional) – Range of aspect ratio of the crop. Default: (3. / 4., 4. / 3.).
interpolation (str, optional) – Algorithm used for interpolation. It can be only either one of the following: “nearest” | “bilinear” | “bicubic” | “area” | “lanczos”. Default: “bilinear”.

get_params(data)[source]¶

Get parameters for a random sized crop.

Parameters: data (np.ndarray) – Image of type numpy array to be cropped.
Returns: A tuple containing the coordinates of the top left corner and the chosen crop size.

transform(results)[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.CompositeFg(fg_dirs, alpha_dirs, interpolation='nearest')[source]¶

Bases: mmcv.transforms.BaseTransform

Composite foreground with a random foreground.

This class composites the current training sample with additional data randomly (could be from the same dataset). With probability 0.5, the sample will be composited with a random sample from the specified directory. The composition is performed as:

\[ \begin{align}\begin{aligned}fg_{new} = \alpha_1 * fg_1 + (1 - \alpha_1) * fg_2\\\alpha_{new} = 1 - (1 - \alpha_1) * (1 - \alpha_2)\end{aligned}\end{align} \]

where \((fg_1, \alpha_1)\) is from the current sample and \((fg_2, \alpha_2)\) is the randomly loaded sample. With the above composition, \(\alpha_{new}\) is still in [0, 1].

Required keys are “alpha” and “fg”. Modified keys are “alpha” and “fg”.

Parameters

fg_dirs (str | list[str]) – Path of directories to load foreground images from.
alpha_dirs (str | list[str]) – Path of directories to load alpha mattes from.
interpolation (str) – Interpolation method of mmcv.imresize to resize the randomly loaded images. Default: ‘nearest’.

transform(results: dict) → dict[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

_get_file_list(fg_dirs, alpha_dirs)[source]¶

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.MergeFgAndBg[source]¶

Bases: mmcv.transforms.BaseTransform

Composite foreground image and background image with alpha.

Required keys are “alpha”, “fg” and “bg”, added key is “merged”.

transform(results: dict) → dict[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__() → str[source]¶: Return repr(self).

class mmagic.datasets.transforms.PerturbBg(gamma_ratio=0.6)[source]¶

Bases: mmcv.transforms.BaseTransform

Randomly add gaussian noise or gamma change to background image.

Required key is “bg”, added key is “noisy_bg”.

Parameters: gamma_ratio (float, optional) – The probability to use gamma correction instead of gaussian noise. Defaults to 0.6.

transform(results: dict) → dict[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.RandomJitter(hue_range=40)[source]¶

Bases: mmcv.transforms.BaseTransform

Randomly jitter the foreground in hsv space.

The jitter range of hue is adjustable while the jitter ranges of saturation and value are adaptive to the images. Side effect: the “fg” image will be converted to np.float32. Required keys are “fg” and “alpha”, modified key is “fg”.

Parameters: hue_range (float | tuple[float]) – Range of hue jittering. If it is a float instead of a tuple like (min, max), the range of hue jittering will be (-hue_range, +hue_range). Default: 40.

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.RandomLoadResizeBg(bg_dir, flag='color', channel_order='bgr')[source]¶

Bases: mmcv.transforms.BaseTransform

Randomly load a background image and resize it.

Required key is “fg”, added key is “bg”.

Parameters

bg_dir (str) – Path of directory to load background images from.
flag (str) – Loading flag for images. Default: ‘color’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
kwargs (dict) – Args for file client.

transform(results: dict) → dict[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.PackInputs(keys: Tuple[List[str], str] = ['merged', 'img'], meta_keys: Tuple[List[str], str] = [], data_keys: Tuple[List[str], str] = [])[source]¶

Bases: mmcv.transforms.base.BaseTransform

Pack data into DataSample for training, evaluation and testing.

MMagic follows the design of data structure from MMEngine.: Data from the loader will be packed into data field of DataSample. More details of DataSample refer to the documentation of MMEngine: https://mmengine.readthedocs.io/en/latest/advanced_tutorials/data_element.html

Parameters

Tuple[List[str] (meta_keys) – The keys to saved in returned inputs, which are used as the input of models, default to [‘img’, ‘noise’, ‘merged’].
str – The keys to saved in returned inputs, which are used as the input of models, default to [‘img’, ‘noise’, ‘merged’].
None] – The keys to saved in returned inputs, which are used as the input of models, default to [‘img’, ‘noise’, ‘merged’].
Tuple[List[str] – The keys to saved in data_field of the data_samples.
str – The keys to saved in data_field of the data_samples.
None] – The keys to saved in data_field of the data_samples.
Tuple[List[str] – The meta keys to saved in metainfo of the data_samples. All the other data will be packed into the data of the data_samples
str – The meta keys to saved in metainfo of the data_samples. All the other data will be packed into the data of the data_samples
None] – The meta keys to saved in metainfo of the data_samples. All the other data will be packed into the data of the data_samples

transform(results: dict) → dict[source]¶

Method to pack the input data.

Parameters

results (dict) – Result dict from the data pipeline.

Returns

A dict contains

’inputs’ (obj:dict): The forward data of models. According to different tasks, the inputs may contain images, videos, labels, text, etc.
’data_samples’ (obj:DataSample): The annotation info of the
sample.

Return type

dict

__repr__() → str[source]¶: Return repr(self).

class mmagic.datasets.transforms.GenerateCoordinateAndCell(sample_quantity=None, scale=None, target_size=None, reshape_gt=True)[source]¶

Bases: mmcv.transforms.base.BaseTransform

Generate coordinate and cell. Generate coordinate from the desired size of SR image.

Train or val:

Generate coordinate from GT.

#. Reshape GT image to (HgWg, 3) and transpose to (3, HgWg). where Hg and Wg represent the height and width of GT.

Test:

Generate coordinate from LQ and scale or target_size.
Then generate cell from coordinate.

Parameters

sample_quantity (int | None) – The quantity of samples in coordinates. To ensure that the GT tensors in a batch have the same dimensions. Default: None.
scale (float) – Scale of upsampling. Default: None.
target_size (tuple[int]) – Size of target image. Default: None.
reshape_gt (bool) – Whether reshape gt to (-1, 3). Default: True If sample_quantity is not None, reshape_gt = True.

The priority of getting ‘size of target image’ is:

results[‘gt’].shape[-2:]
results[‘lq’].shape[-2:] * scale
target_size

transform(results)[source]¶

Call function.

Parameters

results (Require either in) – A dict containing the necessary information
augmentation. (and data for) –
results –
'lq' (1.) –
'gt' (2.) –
None (3.) –
and (the premise is self.target_size) –
len (self.target_size) –

Returns

A dict containing the processed data and information. Reshape ‘gt’ to (-1, 3) and transpose to (3, -1) if ‘gt’ in results. Add ‘coord’ and ‘cell’.

Return type

dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.GenerateFacialHeatmap(image_key, ori_size, target_size, sigma=1.0, use_cache=True)[source]¶

Bases: mmcv.transforms.base.BaseTransform

Generate heatmap from keypoint.

Parameters

image_key (str) – Key of facial image in dict.
ori_size (int | Tuple[int]) – Original image size of keypoint.
target_size (int | Tuple[int]) – Target size of heatmap.
sigma (float) – Sigma parameter of heatmap. Default: 1.0
use_cache (bool) – If True, load all heatmap at once. Default: True.

transform(results)[source]¶

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation. Require keypoint.

Returns

A dict containing the processed data and information.: Add ‘heatmap’.

Return type

dict

generate_heatmap_from_img(image)[source]¶

Generate heatmap from img.

Parameters: image (np.ndarray) – Face image.

results:: heatmap (np.ndarray): Heatmap the face image.

_face_alignment_detector(image)[source]¶

Generate face landmark by face_alignment.

Parameters: image (np.ndarray) – Face image.
Returns: Location of landmark.
Return type: landmark (Tuple[float])

_generate_one_heatmap(keypoint)[source]¶

Generate One Heatmap.

Parameters: keypoint (Tuple[float]) – Location of a landmark.

results:: heatmap (np.ndarray): A heatmap of landmark.

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.GenerateFrameIndices(interval_list, frames_per_clip=99)[source]¶

Bases: mmcv.transforms.BaseTransform

Generate frame index for REDS datasets. It also performs temporal augmentation with random interval.

Required Keys:

img_path
gt_path
key
num_input_frames

Modified Keys:

img_path
gt_path

Added Keys:

interval
reverse

Parameters

interval_list (list[int]) – Interval list for temporal augmentation. It will randomly pick an interval from interval_list and sample frame index with the interval.
frames_per_clip (int) – Number of frames per clips. Default: 99 for REDS dataset.

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.GenerateFrameIndiceswithPadding(padding, filename_tmpl='{:08d}')[source]¶

Bases: mmcv.transforms.BaseTransform

Generate frame index with padding for REDS dataset and Vid4 dataset during testing.

Required Keys:

img_path
gt_path
key
num_input_frames
sequence_length

Modified Keys:

img_path
gt_path

Parameters

padding –

padding mode, one of ‘replicate’ | ‘reflection’ | ‘reflection_circle’ | ‘circle’.

Examples: current_idx = 0, num_input_frames = 5 The generated frame indices under different padding mode:

replicate: [0, 0, 0, 1, 2] reflection: [2, 1, 0, 1, 2] reflection_circle: [4, 3, 0, 1, 2] circle: [3, 4, 0, 1, 2]

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.GenerateSegmentIndices(interval_list, start_idx=0, filename_tmpl='{:08d}.png')[source]¶

Bases: mmcv.transforms.BaseTransform

Generate frame indices for a segment. It also performs temporal augmentation with random interval.

Required Keys:

img_path
gt_path
key
num_input_frames
sequence_length

Modified Keys:

img_path
gt_path

Added Keys:

interval
reverse

Parameters

interval_list (list[int]) – Interval list for temporal augmentation. It will randomly pick an interval from interval_list and sample frame index with the interval.
start_idx (int) – The index corresponds to the first frame in the sequence. Default: 0.
filename_tmpl (str) – Template for file name. Default: ‘{:08d}.png’.

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.GetMaskedImage(img_key='gt', mask_key='mask', out_key='img', zero_value=127.5)[source]¶

Bases: mmcv.transforms.base.BaseTransform

Get masked image.

Parameters

img_key (str) – Key for clean image. Default: ‘gt’.
mask_key (str) – Key for mask image. The mask shape should be (h, w, 1) while ‘1’ indicate holes and ‘0’ indicate valid regions. Default: ‘mask’.
img_key – Key for output image. Default: ‘img’.
zero_value (float) – Pixel value of masked area.

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.GetSpatialDiscountMask(gamma=0.99, beta=1.5)[source]¶

Bases: mmcv.transforms.BaseTransform

Get spatial discounting mask constant.

Spatial discounting mask is first introduced in: Generative Image Inpainting with Contextual Attention.

Parameters

gamma (float, optional) – Gamma for computing spatial discounting. Defaults to 0.99.
beta (float, optional) – Beta for computing spatial discounting. Defaults to 1.5.

spatial_discount_mask(mask_width, mask_height)[source]¶

Generate spatial discounting mask constant.

Parameters

mask_width (int) – The width of bbox hole.
mask_height (int) – The height of bbox height.

Returns

Spatial discounting mask.

Return type

np.ndarray

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.LoadImageFromFile(key: str, color_type: str = 'color', channel_order: str = 'bgr', imdecode_backend: Optional[str] = None, use_cache: bool = False, to_float32: bool = False, to_y_channel: bool = False, save_original_img: bool = False, backend_args: Optional[dict] = None)[source]¶

Bases: mmcv.transforms.BaseTransform

Load a single image or image frames from corresponding paths. Required Keys: - [Key]_path

New Keys: - [KEY] - ori_[KEY]_shape - ori_[KEY]

Parameters

key (str) – Keys in results to find corresponding path.
color_type (str) – The flag argument for :func:mmcv.imfrombytes. Defaults to ‘color’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :func:mmcv.imfrombytes for details. candidates are ‘cv2’, ‘turbojpeg’, ‘pillow’, and ‘tifffile’. Defaults to None.
use_cache (bool) – If True, load all images at once. Default: False.
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
to_y_channel (bool) – Whether to convert the loaded image to y channel. Only support ‘rgb2ycbcr’ and ‘rgb2ycbcr’ Defaults to False.
backend_args (dict, optional) – Arguments to instantiate the prefix of uri corresponding backend. Defaults to None.

transform(results: dict) → dict[source]¶

Functions to load image or frames.

Parameters: results (dict) – Result dict from :obj:mmcv.BaseDataset.
Returns: The dict contains loaded image and meta information.
Return type: dict

_load_image(filename)[source]¶

Load an image from file.

Parameters: filename (str) – Path of image file.
Returns: Image.
Return type: np.ndarray

_convert(img: numpy.ndarray)[source]¶

Convert an image to the require format.

Parameters: img (np.ndarray) – The original image.
Returns: The converted image.
Return type: np.ndarray

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.LoadMask(mask_mode='bbox', mask_config=None)[source]¶

Bases: mmcv.transforms.BaseTransform

Load Mask for multiple types.

For different types of mask, users need to provide the corresponding config dict.

Example config for bbox:

config = dict(img_shape=(256, 256), max_bbox_shape=128)

Example config for irregular:

config = dict(
    img_shape=(256, 256),
    num_vertices=(4, 12),
    max_angle=4.,
    length_range=(10, 100),
    brush_width=(10, 40),
    area_ratio_range=(0.15, 0.5))

Example config for ff:

config = dict(
    img_shape=(256, 256),
    num_vertices=(4, 12),
    mean_angle=1.2,
    angle_range=0.4,
    brush_width=(12, 40))

Example config for set:

config = dict(
    mask_list_file='xxx/xxx/ooxx.txt',
    prefix='/xxx/xxx/ooxx/',
    io_backend='local',
    color_type='unchanged',
    file_client_kwargs=dict()
)

The mask_list_file contains the list of mask file name like this:
    test1.jpeg
    test2.jpeg
    ...
    ...

The prefix gives the data path.

Parameters

mask_mode (str) – Mask mode in [‘bbox’, ‘irregular’, ‘ff’, ‘set’, ‘file’]. Default: ‘bbox’. * bbox: square bounding box masks. * irregular: irregular holes. * ff: free-form holes from DeepFillv2. * set: randomly get a mask from a mask set. * file: get mask from ‘mask_path’ in results.
mask_config (dict) – Params for creating masks. Each type of mask needs different configs. Default: None.

_init_info()[source]¶

_get_random_mask_from_set()[source]¶

_get_mask_from_file(path)[source]¶

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.LoadPairedImageFromFile(key: str, domain_a: str = 'A', domain_b: str = 'B', color_type: str = 'color', channel_order: str = 'bgr', imdecode_backend: Optional[str] = None, use_cache: bool = False, to_float32: bool = False, to_y_channel: bool = False, save_original_img: bool = False, backend_args: Optional[dict] = None)[source]¶

Bases: LoadImageFromFile

Load a pair of images from file.

Each sample contains a pair of images, which are concatenated in the w dimension (a|b). This is a special loading class for generation paired dataset. It loads a pair of images as the common loader does and crops it into two images with the same shape in different domains.

Required key is “pair_path”. Added or modified keys are “pair”, “pair_ori_shape”, “ori_pair”, “img_{domain_a}”, “img_{domain_b}”, “img_{domain_a}_path”, “img_{domain_b}_path”, “img_{domain_a}_ori_shape”, “img_{domain_b}_ori_shape”, “ori_img_{domain_a}” and “ori_img_{domain_b}”.

Parameters

key (str) – Keys in results to find corresponding path.
domain_a (str, Optional) – One of the paired image domain. Defaults to ‘A’.
domain_b (str, Optional) – The other of the paired image domain. Defaults to ‘B’.
color_type (str) – The flag argument for :func:mmcv.imfrombytes. Defaults to ‘color’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :func:mmcv.imfrombytes for details. candidates are ‘cv2’, ‘turbojpeg’, ‘pillow’, and ‘tifffile’. Defaults to None.
use_cache (bool) – If True, load all images at once. Default: False.
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
to_y_channel (bool) – Whether to convert the loaded image to y channel. Only support ‘rgb2ycbcr’ and ‘rgb2ycbcr’ Defaults to False.
backend_args (dict, optional) – Arguments to instantiate the prefix of uri corresponding backend. Defaults to None.
io_backend (str, optional) – io backend where images are store. Defaults to None.

transform(results: dict) → dict[source]¶

Functions to load paired images.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

class mmagic.datasets.transforms.MATLABLikeResize(keys, scale=None, output_shape=None, kernel='bicubic', kernel_width=4.0)[source]¶

Bases: mmcv.transforms.BaseTransform

Resize the input image using MATLAB-like downsampling.

Currently support bicubic interpolation only. Note that the output of this function is slightly different from the official MATLAB function.

Required keys are the keys in attribute “keys”. Added or modified keys are “scale” and “output_shape”, and the keys in attribute “keys”.

Parameters

keys (list[str]) – A list of keys whose values are modified.
scale (float | None, optional) – The scale factor of the resize operation. If None, it will be determined by output_shape. Default: None.
output_shape (tuple(int) | None, optional) – The size of the output image. If None, it will be determined by scale. Note that if scale is provided, output_shape will not be used. Default: None.
kernel (str, optional) – The kernel for the resize operation. Currently support ‘bicubic’ only. Default: ‘bicubic’.
kernel_width (float) – The kernel width. Currently support 4.0 only. Default: 4.0.

_resize(img)[source]¶

resize an image to the require size.

Parameters: img (np.ndarray) – The original image.
Returns: The resized image.
Return type: output (np.ndarray)

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.Normalize(keys, mean, std, to_rgb=False, save_original=False)[source]¶

Bases: mmcv.transforms.BaseTransform

Normalize images with the given mean and std value.

Required keys are the keys in attribute “keys”, added or modified keys are the keys in attribute “keys” and these keys with postfix ‘_norm_cfg’. It also supports normalizing a list of images.

Parameters

keys (Sequence[str]) – The images to be normalized.
mean (np.ndarray) – Mean values of different channels.
std (np.ndarray) – Std values of different channels.
to_rgb (bool) – Whether to convert channels from BGR to RGB. Default: False.
save_original (bool) – Whether to save original images. Default: False.

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.RescaleToZeroOne(keys)[source]¶

Bases: mmcv.transforms.BaseTransform

Transform the images into a range between 0 and 1.

Required keys are the keys in attribute “keys”, added or modified keys are the keys in attribute “keys”. It also supports rescaling a list of images.

Parameters: keys (Sequence[str]) – The images to be transformed.

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.DegradationsWithShuffle(degradations, keys, shuffle_idx=None)[source]¶

Apply random degradations to input, with degradations being shuffled.

Degradation groups are supported. The order of degradations within the same group is preserved. For example, if we have degradations = [a, b, [c, d]] and shuffle_idx = None, then the possible orders are

[a, b, [c, d]]
[a, [c, d], b]
[b, a, [c, d]]
[b, [c, d], a]
[[c, d], a, b]
[[c, d], b, a]

Modified keys are the attributed specified in “keys”.

Parameters

degradations (list[dict]) – The list of degradations.
keys (list[str]) – A list specifying the keys whose values are modified.
shuffle_idx (list | None, optional) – The degradations corresponding to these indices are shuffled. If None, all degradations are shuffled. Default: None.

_build_degradations(degradations)[source]¶

__call__(results)[source]¶: Call this transform.

__repr__()[source]¶: Print the basic information of the transform.

class mmagic.datasets.transforms.RandomBlur(params, keys)[source]¶

Apply random blur to the input.

Modified keys are the attributed specified in “keys”.

Parameters

params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.

get_kernel(num_kernels: int)[source]¶

This is the function to create kernel.

Parameters: num_kernels (int) – the number of kernels
Returns: _description_
Return type: _type_

_apply_random_blur(imgs)[source]¶

This is the function to apply blur operation on images.

Parameters: imgs (Tensor) – images
Returns: Images applied blur
Return type: Tensor

__call__(results)[source]¶: Call this transform.

__repr__()[source]¶: Print the basic information of the transform.

class mmagic.datasets.transforms.RandomJPEGCompression(params, keys, color_type='color', bgr2rgb=False)[source]¶

Apply random JPEG compression to the input.

Modified keys are the attributed specified in “keys”.

Parameters

params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.
bgr2rgb (str) – Whether change channel order. Default: False.

_apply_random_compression(imgs)[source]¶

__call__(results)[source]¶: Call this transform.

__repr__()[source]¶: Print the basic information of the transform.

class mmagic.datasets.transforms.RandomNoise(params, keys)[source]¶

Apply random noise to the input.

Currently support Gaussian noise and Poisson noise.

Modified keys are the attributed specified in “keys”.

Parameters

params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.

_apply_gaussian_noise(imgs)[source]¶

This is the function used to apply gaussian noise on images.

Parameters: imgs (Tensor) – images
Returns: images applied gaussian noise
Return type: Tensor

_apply_poisson_noise(imgs)[source]¶

_apply_random_noise(imgs)[source]¶

This is the function used to apply random noise on images.

Parameters: imgs (Tensor) – training images
Returns: _description_
Return type: _type_

__call__(results)[source]¶: Call this transform.

__repr__()[source]¶: Print the basic information of the transform.

class mmagic.datasets.transforms.RandomResize(params, keys)[source]¶

Randomly resize the input.

Modified keys are the attributed specified in “keys”.

Parameters

params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.

_random_resize(imgs)[source]¶

This is the function used to randomly resize images for training augmentation.

Parameters: imgs (Tensor) – training images.
Returns: images after randomly resized
Return type: Tensor

__call__(results)[source]¶: Call this transform.

__repr__()[source]¶: Print the basic information of the transform.

class mmagic.datasets.transforms.RandomVideoCompression(params, keys)[source]¶

Apply random video compression to the input.

Modified keys are the attributed specified in “keys”.

Parameters

params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.

_apply_random_compression(imgs)[source]¶

This is the function to apply random compression on images.

Parameters: imgs (Tensor) – training images
Returns: images after randomly compressed
Return type: Tensor

__call__(results)[source]¶: Call this transform.

__repr__()[source]¶: Print the basic information of the transform.

class mmagic.datasets.transforms.RandomDownSampling(scale_min=1.0, scale_max=4.0, patch_size=None, interpolation='bicubic', backend='pillow')[source]¶

Bases: mmcv.transforms.BaseTransform

Generate LQ image from GT (and crop), which will randomly pick a scale.

Parameters

scale_min (float) – The minimum of upsampling scale, inclusive. Default: 1.0.
scale_max (float) – The maximum of upsampling scale, exclusive. Default: 4.0.
patch_size (int) – The cropped lr patch size. Default: None, means no crop.
interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear”, “bicubic”, “box”, “lanczos”, “hamming” for ‘pillow’ backend. Default: “bicubic”.
backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: “pillow”.
[scale_min (Scale will be picked in the range of) –
scale_max). –

transform(results)[source]¶

transform function.

Parameters

results (dict) – A dict containing the necessary information and data for augmentation. ‘gt’ is required.

Returns

A dict containing the processed data and information.: modified ‘gt’, supplement ‘lq’ and ‘scale’ to keys.

Return type

dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.FormatTrimap(to_onehot=False)[source]¶

Bases: mmcv.transforms.BaseTransform

Convert trimap (tensor) to one-hot representation.

It transforms the trimap label from (0, 128, 255) to (0, 1, 2). If to_onehot is set to True, the trimap will convert to one-hot tensor of shape (3, H, W). Required key is “trimap”, added or modified key are “trimap” and “format_trimap_to_onehot”.

Parameters: to_onehot (bool) – whether convert trimap to one-hot tensor. Default: False.

transform(results)[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.GenerateTrimap(kernel_size, iterations=1, random=True)[source]¶

Bases: mmcv.transforms.BaseTransform

Using random erode/dilate to generate trimap from alpha matte.

Required key is “alpha”, added key is “trimap”.

Parameters

kernel_size (int | tuple[int]) – The range of random kernel_size of erode/dilate; int indicates a fixed kernel_size. If random is set to False and kernel_size is a tuple of length 2, then it will be interpreted as (erode kernel_size, dilate kernel_size). It should be noted that the kernel of the erosion and dilation has the same height and width.
iterations (int | tuple[int], optional) – The range of random iterations of erode/dilate; int indicates a fixed iterations. If random is set to False and iterations is a tuple of length 2, then it will be interpreted as (erode iterations, dilate iterations). Default to 1.
random (bool, optional) – Whether use random kernel_size and iterations when generating trimap. See kernel_size and iterations for more information. Default to True.

transform(results: dict) → dict[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.GenerateTrimapWithDistTransform(dist_thr=20, random=True)[source]¶

Bases: mmcv.transforms.BaseTransform

Generate trimap with distance transform function.

Parameters

dist_thr (int, optional) – Distance threshold. Area with alpha value between (0, 255) will be considered as initial unknown area. Then area with distance to unknown area smaller than the distance threshold will also be consider as unknown area. Defaults to 20.
random (bool, optional) – If True, use random distance threshold from [1, dist_thr). If False, use dist_thr as the distance threshold directly. Defaults to True.

transform(results: dict) → dict[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.TransformTrimap[source]¶

Bases: mmcv.transforms.BaseTransform

Transform trimap into two-channel and six-channel.

This class will generate a two-channel trimap composed of definite foreground and background masks and encode it into a six-channel trimap using Gaussian blurs of the generated two-channel trimap at three different scales. The transformed trimap has 6 channels.

Required key is “trimap”, added key is “transformed_trimap” and “two_channel_trimap”.

Adopted from the following repository: https://github.com/MarcoForte/FBA_Matting/blob/master/networks/transforms.py.

transform(results: dict) → dict[source]¶

Transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict containing the processed data and information.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.CopyValues(src_keys, dst_keys)[source]¶

Bases: mmcv.transforms.BaseTransform

Copy the value of source keys to destination keys.

# TODO Change to dict(dst=src)

It does the following: results[dst_key] = results[src_key] for (src_key, dst_key) in zip(src_keys, dst_keys).

Added keys are the keys in the attribute “dst_keys”.

Required Keys:

[SRC_KEYS]

Added Keys:

[DST_KEYS]

Parameters

src_keys (list[str]) – The source keys.
dst_keys (list[str]) – The destination keys.

transform(results)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict with a key added/modified.
Return type: dict

__repr__()[source]¶: Return repr(self).

class mmagic.datasets.transforms.SetValues(dictionary)[source]¶

Bases: mmcv.transforms.BaseTransform

Set value to destination keys.

It does the following: results[key] = value

Added keys are the keys in the dictionary.

Required Keys:

None

Added or Modified Keys:

keys in the dictionary

Parameters: dictionary (dict) – The dictionary to update.

transform(results: Dict)[source]¶

transform function.

Parameters: results (dict) – A dict containing the necessary information and data for augmentation.
Returns: A dict with a key added/modified.
Return type: dict

__repr__()[source]¶: Return repr(self).

mmagic.datasets.transforms¶

Package Contents¶

Classes¶

`mmagic.datasets.transforms`¶