`mmagic.datasets.transforms`¶

Package Contents¶

Classes¶

`AlbuCorruptFunction`	AlbuCorruptFunction augmentation.
`PairedAlbuTransForms`	PairedAlbuTransForms augmentation.
`Albumentations`	Albumentation augmentation.
`GenerateSeg`	Generate segmentation mask from alpha matte.
`GenerateSoftSeg`	Generate soft segmentation mask from input segmentation mask.
`MirrorSequence`	Extend short sequences (e.g. Vimeo-90K) by mirroring the sequences.
`TemporalReverse`	Reverse frame lists for temporal augmentation.
`BinarizeImage`	Binarize image.
`Clip`	Clip the pixels.
`ColorJitter`	An interface for torch color jitter so that it can be invoked in mmagic
`RandomAffine`	Apply random affine to input images.
`RandomMaskDilation`	Randomly dilate binary masks.
`UnsharpMasking`	Apply unsharp masking to an image or a sequence of images.
`Flip`	Flip the input data with a probability.
`NumpyPad`	Numpy Padding.
`RandomRotation`	Rotate the image by a randomly-chosen angle, measured in degree.
`RandomTransposeHW`	Randomly transpose images in H and W dimensions with a probability.
`Resize`	Resize data to a specific size for training or resize the images to fit
`CenterCropLongEdge`	Center crop the given image by the long edge.
`Crop`	Crop data to specific size for training.
`CropAroundCenter`	Randomly crop the images around unknown area in the center 1/4 images.
`CropAroundFg`	Crop around the whole foreground in the segmentation mask.
`CropAroundUnknown`	Crop around unknown area with a randomly selected scale.
`CropLike`	Crop/pad the image in the target_key according to the size of image in
`FixedCrop`	Crop paired data (at a specific position) to specific size for training.
`InstanceCrop`	Use maskrcnn to detect instances on image.
`ModCrop`	Mod crop images, used during testing.
`PairedRandomCrop`	Paired random crop.
`RandomCropLongEdge`	Random crop the given image by the long edge.
`RandomResizedCrop`	Crop data to random size and aspect ratio.
`CompositeFg`	Composite foreground with a random foreground.
`MergeFgAndBg`	Composite foreground image and background image with alpha.
`PerturbBg`	Randomly add gaussian noise or gamma change to background image.
`RandomJitter`	Randomly jitter the foreground in hsv space.
`RandomLoadResizeBg`	Randomly load a background image and resize it.
`PackInputs`	Pack data into DataSample for training, evaluation and testing.
`GenerateCoordinateAndCell`	Generate coordinate and cell. Generate coordinate from the desired size
`GenerateFacialHeatmap`	Generate heatmap from keypoint.
`GenerateFrameIndices`	Generate frame index for REDS datasets. It also performs temporal
`GenerateFrameIndiceswithPadding`	Generate frame index with padding for REDS dataset and Vid4 dataset
`GenerateSegmentIndices`	Generate frame indices for a segment. It also performs temporal
`GetMaskedImage`	Get masked image.
`GetSpatialDiscountMask`	Get spatial discounting mask constant.
`LoadImageFromFile`	Load a single image or image frames from corresponding paths. Required
`LoadMask`	Load Mask for multiple types.
`LoadPairedImageFromFile`	Load a pair of images from file.
`MATLABLikeResize`	Resize the input image using MATLAB-like downsampling.
`Normalize`	Normalize images with the given mean and std value.
`RescaleToZeroOne`	Transform the images into a range between 0 and 1.
`DegradationsWithShuffle`	Apply random degradations to input, with degradations being shuffled.
`RandomBlur`	Apply random blur to the input.
`RandomJPEGCompression`	Apply random JPEG compression to the input.
`RandomNoise`	Apply random noise to the input.
`RandomResize`	Randomly resize the input.
`RandomVideoCompression`	Apply random video compression to the input.
`RandomDownSampling`	Generate LQ image from GT (and crop), which will randomly pick a scale.
`FormatTrimap`	Convert trimap (tensor) to one-hot representation.
`GenerateTrimap`	Using random erode/dilate to generate trimap from alpha matte.
`GenerateTrimapWithDistTransform`	Generate trimap with distance transform function.
`TransformTrimap`	Transform trimap into two-channel and six-channel.
`CopyValues`	Copy the value of source keys to destination keys.
`SetValues`	Set value to destination keys.

class mmagic.datasets.transforms.AlbuCorruptFunction(keys: List[str], config: List[dict], p: float = 1.0)[源代码]¶

Bases: mmcv.transforms.BaseTransform

AlbuCorruptFunction augmentation.

Apply the same AlbuCorruptFunction augmentation to the input images.

transform(results)¶

processing input results according to self.augs.

参数

results (dict) – contains the processed data
pipeline. (through the transform) –

返回

the processed data.

返回类型

results

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.PairedAlbuTransForms(size: int, lq_key: str = 'img', gt_key: str = 'gt', scope: str = 'geometric', crop: str = 'random', p: float = 0.5)[源代码]¶

Bases: mmcv.transforms.BaseTransform

PairedAlbuTransForms augmentation.

Apply the same AlbuTransforms augmentation to paired images.

transform(results)¶

processing input results according to self.pipeline.

参数

results (dict) – contains the processed data
pipeline. (through the transform) –

返回

the processed data.

返回类型

results

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.Albumentations(keys: List[str], transforms: List[dict])[源代码]¶

Bases: mmcv.transforms.BaseTransform

Albumentation augmentation.

Adds custom transformations from Albumentations library. Please, visit https://github.com/albumentations-team/albumentations and https://albumentations.ai/docs/getting_started/transforms_and_targets to get more information.

An example of transforms is as followed:

albu_transforms = [
    dict(
        type='Resize',
        height=100,
        width=100,
    ),
    dict(
        type='RandomFog',
        p=0.5,
    ),
    dict(
        type='RandomRain',
        p=0.5
    ),
    dict(
        type='RandomSnow',
        p=0.5,
    ),
]
pipeline = [
    dict(
        type='LoadImageFromFile',
        key='img',
        color_type='color',
        channel_order='rgb',
        imdecode_backend='cv2'),
    dict(
        type='Albumentations',
        keys=['img'],
        transforms=albu_transforms),
    dict(type='PackInputs')
]

参数

keys (list[str]) – A list specifying the keys whose values are modified.
transforms (list[dict]) – A list of albu transformations.

albu_builder(cfg: dict) → albumentations¶

Import a module from albumentations.

It inherits some of build_from_cfg() logic.

参数: cfg (dict) – Config dict. It should at least contain the key “type”.
返回: The constructed object.
返回类型: obj

_apply_albu(imgs)¶

transform(results)¶: Transform function of Albumentations.

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.GenerateSeg(kernel_size=5, erode_iter_range=(10, 20), dilate_iter_range=(15, 30), num_holes_range=(0, 3), hole_sizes=[(15, 15), (25, 25), (35, 35), (45, 45)], blur_ksizes=[(21, 21), (31, 31), (41, 41)])[源代码]¶

Bases: mmcv.transforms.BaseTransform

Generate segmentation mask from alpha matte.

参数

kernel_size (int, optional) – Kernel size for both erosion and dilation. The kernel will have the same height and width. Defaults to 5.
erode_iter_range (tuple, optional) – Iteration of erosion. Defaults to (10, 20).
dilate_iter_range (tuple, optional) – Iteration of dilation. Defaults to (15, 30).
num_holes_range (tuple, optional) – Range of number of holes to randomly select from. Defaults to (0, 3).
hole_sizes (list, optional) – List of (h, w) to be selected as the size of the rectangle hole. Defaults to [(15, 15), (25, 25), (35, 35), (45, 45)].
blur_ksizes (list, optional) – List of (h, w) to be selected as the kernel_size of the gaussian blur. Defaults to [(21, 21), (31, 31), (41, 41)].

static _crop_hole(img, start_point, hole_size)¶

Create a all-zero rectangle hole in the image.

参数

img (np.ndarray) – Source image.
start_point (tuple[int]) – The top-left point of the rectangle.
hole_size (tuple[int]) – The height and width of the rectangle hole.

返回

The cropped image.

返回类型

np.ndarray

transform(results: dict) → dict¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.GenerateSoftSeg(fg_thr=0.2, border_width=25, erode_ksize=3, dilate_ksize=5, erode_iter_range=(10, 20), dilate_iter_range=(3, 7), blur_ksizes=[(21, 21), (31, 31), (41, 41)])[源代码]¶

Bases: mmcv.transforms.BaseTransform

Generate soft segmentation mask from input segmentation mask.

Required key is “seg”, added key is “soft_seg”.

参数

fg_thr (float, optional) – Threshold of the foreground in the normalized input segmentation mask. Defaults to 0.2.
border_width (int, optional) – Width of border to be padded to the bottom of the mask. Defaults to 25.
erode_ksize (int, optional) – Fixed kernel size of the erosion. Defaults to 5.
dilate_ksize (int, optional) – Fixed kernel size of the dilation. Defaults to 5.
erode_iter_range (tuple, optional) – Iteration of erosion. Defaults to (10, 20).
dilate_iter_range (tuple, optional) – Iteration of dilation. Defaults to (3, 7).
blur_ksizes (list, optional) – List of (h, w) to be selected as the kernel_size of the gaussian blur. Defaults to [(21, 21), (31, 31), (41, 41)].

transform(results: dict) → dict¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.MirrorSequence(keys)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Extend short sequences (e.g. Vimeo-90K) by mirroring the sequences.

Given a sequence with N frames (x1, …, xN), extend the sequence to (x1, …, xN, xN, …, x1).

Required Keys:

[KEYS]

Modified Keys:

[KEYS]

参数: keys (list[str]) – The frame lists to be extended.

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.TemporalReverse(keys, reverse_ratio=0.5)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Reverse frame lists for temporal augmentation.

Required keys are the keys in attributes “lq” and “gt”, added or modified keys are “lq”, “gt” and “reverse”.

参数

keys (list[str]) – The frame lists to be reversed.
reverse_ratio (float) – The probability to reverse the frame lists. Default: 0.5.

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.BinarizeImage(keys, binary_thr, a_min=0, a_max=1, dtype=np.uint8)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Binarize image.

参数

keys (Sequence[str]) – The images to be binarized.
binary_thr (float) – Threshold for binarization.
a_min (int) – Lower limits of pixel value.
a_max (int) – Upper limits of pixel value.
dtype (np.dtype) – Set the data type of the output. Default: np.uint8

_binarize(img)¶

Binarize image.

参数: img (np.ndarray) – Input image.
返回: Output image.
返回类型: img (np.ndarray)

transform(results)¶

The transform function of BinarizeImage.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.Clip(keys, a_min=0, a_max=255)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Clip the pixels.

Modified keys are the attributes specified in “keys”.

参数

keys (list[str]) – The keys whose values are clipped.
a_min (int) – Lower limits of pixel value.
a_max (int) – Upper limits of pixel value.

_clip(input_)¶

Clip the pixels.

参数: input (Union[List, np.ndarray]) – Pixels to clip.
返回: Clipped pixels.
返回类型: Union[List, np.ndarray]

transform(results)¶

transform function.

参数

results (dict) – A dict containing the necessary information and data for augmentation.

返回

A dict with the values of the specified keys are rounded: and clipped.

返回类型

dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.ColorJitter(keys, channel_order='rgb', **kwargs)[源代码]¶

Bases: mmcv.transforms.BaseTransform

An interface for torch color jitter so that it can be invoked in mmagic pipeline.

Randomly change the brightness, contrast and saturation of an image. Modified keys are the attributes specified in “keys”.

Required Keys:

[KEYS]

Modified Keys:

[KEYS]

参数

keys (list[str]) – The images to be resized.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘rgb’.

提示

**kwards follows the args list of torchvision.transforms.ColorJitter.

brightness (float or tuple of float (min, max)): How much to jitter: brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.
contrast (float or tuple of float (min, max)): How much to jitter: contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.
saturation (float or tuple of float (min, max)): How much to jitter: saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.
hue (float or tuple of float (min, max)): How much to jitter hue.: hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.

_color_jitter(image, this_seed)¶

Color Jitter Function.

参数

image (np.ndarray) – Image.
this_seed (int) – Seed of torch.

返回

The output image.

返回类型

image (np.ndarray)

transform(results: Dict) → Dict¶

The transform function of ColorJitter.

参数: results (dict) – The result dict.
返回: The result dict.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.RandomAffine(keys, degrees, translate=None, scale=None, shear=None, flip_ratio=None)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Apply random affine to input images.

This class is adopted from https://github.com/pytorch/vision/blob/v0.5.0/torchvision/transforms/ transforms.py#L1015 It should be noted that in https://github.com/Yaoyi-Li/GCA-Matting/blob/master/dataloader/ data_generator.py#L70 random flip is added. See explanation of flip_ratio below. Required keys are the keys in attribute “keys”, modified keys are keys in attribute “keys”.

参数

keys (Sequence[str]) – The images to be affined.
degrees (float | tuple[float]) – Range of degrees to select from. If it is a float instead of a tuple like (min, max), the range of degrees will be (-degrees, +degrees). Set to 0 to deactivate rotations.
translate (tuple, optional) – Tuple of maximum absolute fraction for horizontal and vertical translations. For example translate=(a, b), then horizontal shift is randomly sampled in the range -img_width * a < dx < img_width * a and vertical shift is randomly sampled in the range -img_height * b < dy < img_height * b. Default: None.
scale (tuple, optional) – Scaling factor interval, e.g (a, b), then scale is randomly sampled from the range a <= scale <= b. Default: None.
shear (float | tuple[float], optional) – Range of shear degrees to select from. If shear is a float, a shear parallel to the x axis and a shear parallel to the y axis in the range (-shear, +shear) will be applied. Else if shear is a tuple of 2 values, a x-axis shear and a y-axis shear in (shear[0], shear[1]) will be applied. Default: None.
flip_ratio (float, optional) – Probability of the image being flipped. The flips in horizontal direction and vertical direction are independent. The image may be flipped in both directions. Default: None.

static _get_params(degrees, translate, scale_ranges, shears, flip_ratio, img_size)¶

Get parameters for affine transformation.

返回: Params to be passed to the affine transformation.
返回类型: paras (tuple)

static _get_inverse_affine_matrix(center, angle, translate, scale, shear, flip)¶

Helper method to compute inverse matrix for affine transformation.

As it is explained in PIL.Image.rotate, we need compute INVERSE of affine transformation matrix: M = T * C * RSS * C^-1 where T is translation matrix:

[1, 0, tx | 0, 1, ty | 0, 0, 1];

C is translation matrix to keep center:: [1, 0, cx | 0, 1, cy | 0, 0, 1];

RSS is rotation with scale and shear matrix.

It is different from the original function in torchvision. 1. The order are changed to flip -> scale -> rotation -> shear. 2. x and y have different scale factors. RSS(shear, a, scale, f) =

[ cos(a + shear)*scale_x*f -sin(a + shear)*scale_y 0] [ sin(a)*scale_x*f cos(a)*scale_y 0] [ 0 0 1]

Thus, the inverse is M^-1 = C * RSS^-1 * C^-1 * T^-1.

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.RandomMaskDilation(keys, binary_thr=0.0, kernel_min=9, kernel_max=49)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Randomly dilate binary masks.

参数

keys (Sequence[str]) – The images to be resized.
binary_thr (float) – Threshold for obtaining binary mask. Default: 0.
kernel_min (int) – Min size of dilation kernel. Default: 9.
kernel_max (int) – Max size of dilation kernel. Default: 49.

_random_dilate(img)¶

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.UnsharpMasking(kernel_size, sigma, weight, threshold, keys)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Apply unsharp masking to an image or a sequence of images.

参数

kernel_size (int) – The kernel_size of the Gaussian kernel.
sigma (float) – The standard deviation of the Gaussian.
weight (float) – The weight of the “details” in the final output.
threshold (float) – Pixel differences larger than this value are regarded as “details”.
keys (list[str]) – The keys whose values are processed.

Added keys are “xxx_unsharp”, where “xxx” are the attributes specified in “keys”.

_unsharp_masking(imgs)¶: Unsharp masking function.

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.Flip(keys, flip_ratio=0.5, direction='horizontal')[源代码]¶

Bases: mmcv.transforms.BaseTransform

Flip the input data with a probability.

Reverse the order of elements in the given data with a specific direction. The shape of the data is preserved, but the elements are reordered. Required keys are the keys in attributes “keys”, added or modified keys are “flip”, “flip_direction” and the keys in attributes “keys”. It also supports flipping a list of images with the same flip.

Required Keys:

[KEYS]

Modified Keys:

[KEYS]

参数

keys (Union[str, List[str]]) – The images to be flipped.
flip_ratio (float) – The probability to flip the images. Default: 0.5.
direction (str) – Flip images horizontally or vertically. Options are “horizontal” | “vertical”. Default: “horizontal”.

_directions = ['horizontal', 'vertical']¶

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.NumpyPad(keys, padding, **kwargs)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Numpy Padding.

In this augmentation, numpy padding is adopted to customize padding augmentation. Please carefully read the numpy manual in: https://numpy.org/doc/stable/reference/generated/numpy.pad.html

If you just hope a single dimension to be padded, you must set padding like this:

padding = ((2, 2), (0, 0), (0, 0))

In this case, if you adopt an input with three dimension, only the first dimension will be padded.

参数

keys (Union[str, List[str]]) – The images to be padded.
padding (int | tuple(int)) – Please refer to the args pad_width in numpy.pad.

transform(results)¶

Call function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__() → str¶: Return repr(self).

class mmagic.datasets.transforms.RandomRotation(keys, degrees)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Rotate the image by a randomly-chosen angle, measured in degree.

参数

keys (list[str]) – The images to be rotated.
degrees (tuple[float] | tuple[int] | float | int) – If it is a tuple, it represents a range (min, max). If it is a float or int, the range is constructed as (-degrees, degrees).

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.RandomTransposeHW(keys, transpose_ratio=0.5)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Randomly transpose images in H and W dimensions with a probability.

(TransposeHW = horizontal flip + anti-clockwise rotation by 90 degrees) When used with horizontal/vertical flips, it serves as a way of rotation augmentation. It also supports randomly transposing a list of images.

Required keys are the keys in attributes “keys”, added or modified keys are “transpose” and the keys in attributes “keys”.

参数

keys (list[str]) – The images to be transposed.
transpose_ratio (float) – The probability to transpose the images. Default: 0.5.

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.Resize(keys: Union[str, List[str]] = 'img', scale=None, keep_ratio=False, size_factor=None, max_size=None, interpolation='bilinear', backend=None, output_keys=None)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Resize data to a specific size for training or resize the images to fit the network input regulation for testing.

When used for resizing images to fit network input regulation, the case is that a network may have several downsample and then upsample operation, then the input height and width should be divisible by the downsample factor of the network. For example, the network would downsample the input for 5 times with stride 2, then the downsample factor is 2^5 = 32 and the height and width should be divisible by 32.

Required keys are the keys in attribute “keys”, added or modified keys are “keep_ratio”, “scale_factor”, “interpolation” and the keys in attribute “keys”.

Required Keys:

Required keys are the keys in attribute “keys”

Modified Keys:

Modified the keys in attribute “keys” or save as new key ([OUT_KEY])

Added Keys:

[OUT_KEY]_shape
keep_ratio
scale_factor
interpolation

All keys in “keys” should have the same shape. “test_trans” is used to record the test transformation to align the input’s shape.

参数

keys (str | list[str]) – The image(s) to be resized.
scale (float | tuple[int]) – If scale is tuple[int], target spatial size (h, w). Otherwise, target spatial size is scaled by input size. Note that when it is used, size_factor and max_size are useless. Default: None
keep_ratio (bool) – If set to True, images will be resized without changing the aspect ratio. Otherwise, it will resize images to a given size. Default: False. Note that it is used together with scale.
size_factor (int) – Let the output shape be a multiple of size_factor. Default:None. Note that when it is used, scale should be set to None and keep_ratio should be set to False.
max_size (int) – The maximum size of the longest side of the output. Default:None. Note that it is used together with size_factor.
interpolation (str) – Algorithm used for interpolation: “nearest” | “bilinear” | “bicubic” | “area” | “lanczos”. Default: “bilinear”.
backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.
output_keys (list[str] | None) – The resized images. Default: None Note that if it is not None, its length should be equal to keys.

_resize(img)¶

Resize function.

参数: img (np.ndarray) – Image.
返回: Resized image.
返回类型: img (np.ndarray)

transform(results: Dict) → Dict¶

Transform function to resize images.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.CenterCropLongEdge(keys='img')[源代码]¶

Bases: mmcv.transforms.BaseTransform

Center crop the given image by the long edge.

参数: keys (list[str]) – The images to be cropped.

transform(results)¶

Call function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.Crop(keys, crop_size, random_crop=True, is_pad_zeros=False)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Crop data to specific size for training.

参数

keys (Sequence[str]) – The images to be cropped.
crop_size (Tuple[int]) – Target spatial size (h, w).
random_crop (bool) – If set to True, it will random crop image. Otherwise, it will work as center crop. Default: True.
is_pad_zeros (bool, optional) – Whether to pad the image with 0 if crop_size is greater than image size. Default: False.

_crop(data)¶

Crop the data.

参数: data (Union[List, np.ndarray]) – Input data to crop.
返回: cropped data and corresponding crop box.
返回类型: tuple

transform(results)¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.CropAroundCenter(crop_size)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Randomly crop the images around unknown area in the center 1/4 images.

This cropping strategy is adopted in GCA matting. The unknown area is the same as semi-transparent area. https://arxiv.org/pdf/2001.04069.pdf

It retains the center 1/4 images and resizes the images to ‘crop_size’. Required keys are “fg”, “bg”, “trimap” and “alpha”, added or modified keys are “crop_bbox”, “fg”, “bg”, “trimap” and “alpha”.

参数: crop_size (int | tuple) – Desired output size. If int, square crop is applied.

transform(results)¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.CropAroundFg(keys, bd_ratio_range=(0.1, 0.4), test_mode=False)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Crop around the whole foreground in the segmentation mask.

Required keys are “seg” and the keys in argument keys. Meanwhile, “seg” must be in argument keys. Added or modified keys are “crop_bbox” and the keys in argument keys.

参数

keys (Sequence[str]) – The images to be cropped. It must contain ‘seg’.
bd_ratio_range (tuple, optional) – The range of the boundary (bd) ratio to select from. The boundary ratio is the ratio of the boundary to the minimal bbox that contains the whole foreground given by segmentation. Default to (0.1, 0.4).
test_mode (bool) – Whether use test mode. In test mode, the tight crop area of foreground will be extended to the a square. Default to False.

transform(results)¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

class mmagic.datasets.transforms.CropAroundUnknown(keys, crop_sizes, unknown_source='alpha', interpolations='bilinear')[源代码]¶

Bases: mmcv.transforms.BaseTransform

Crop around unknown area with a randomly selected scale.

Randomly select the w and h from a list of (w, h). Required keys are the keys in argument keys, added or modified keys are “crop_bbox” and the keys in argument keys. This class assumes value of “alpha” ranges from 0 to 255.

参数

keys (Sequence[str]) – The images to be cropped. It must contain ‘alpha’. If unknown_source is set to ‘trimap’, then it must also contain ‘trimap’.
crop_sizes (list[int | tuple[int]]) – List of (w, h) to be selected.
unknown_source (str, optional) – Unknown area to select from. It must be ‘alpha’ or ‘trimap’. Default to ‘alpha’.
interpolations (str | list[str], optional) – Interpolation method of mmcv.imresize. The interpolation operation will be applied when image size is smaller than the crop_size. If given as a list of str, it should have the same length as keys. Or if given as a str all the keys will be resized with the same method. Default to ‘bilinear’.

transform(results)¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.CropLike(target_key, reference_key=None)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Crop/pad the image in the target_key according to the size of image in the reference_key .

参数

target_key (str) – The key needs to be cropped.
reference_key (str | None) – The reference key, need its size. Default: None.

transform(results)¶

Transform function.

参数

results (dict) – A dict containing the necessary information and data for augmentation. Require self.target_key and self.reference_key.

返回

A dict containing the processed data and information.: Modify self.target_key.

返回类型

dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.FixedCrop(keys, crop_size, crop_pos=None)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Crop paired data (at a specific position) to specific size for training.

参数

keys (Sequence[str]) – The images to be cropped.
crop_size (Tuple[int]) – Target spatial size (h, w).
crop_pos (Tuple[int]) – Specific position (x, y). If set to None, random initialize the position to crop paired data batch. Default: None.

_crop(data, x_offset, y_offset, crop_w, crop_h)¶

Crop the data.

参数

data (Union[List, np.ndarray]) – Input data to crop.
x_offset (int) – The offset of x axis.
y_offset (int) – The offset of y axis.
crop_w (int) – The width of crop bbox.
crop_h (int) – The height of crop bbox.

返回

cropped data and corresponding crop box.

返回类型

tuple

transform(results)¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.InstanceCrop(config_file, from_pretrained=None, key='img', box_num_upbound=- 1, finesize=256)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Use maskrcnn to detect instances on image.

Mask R-CNN is used to detect the instance on the image pred_bbox is used to segment the instance on the image

参数

config_file (str) – config file name relative to detectron2’s “configs/”
key (str) – Unused
box_num_upbound (int) – The upper limit on the number of instances in the figure

transform(results: dict) → dict¶

The transform function of InstanceCrop.

参数

results (dict) – A dict containing the necessary information and data for Conversion

返回

A dict containing the processed data: and information.

返回类型

results (dict)

predict_bbox(image)¶

class mmagic.datasets.transforms.ModCrop(key='gt')[源代码]¶

Bases: mmcv.transforms.BaseTransform

Mod crop images, used during testing.

Required keys are “scale” and “KEY”, added or modified keys are “KEY”.

参数: key (str) – The key of image. Default: ‘gt’

transform(results)¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.PairedRandomCrop(gt_patch_size, lq_key='img', gt_key='gt')[源代码]¶

Bases: mmcv.transforms.BaseTransform

Paired random crop.

It crops a pair of img and gt images with corresponding locations. It also supports accepting img list and gt list. Required keys are “scale”, “lq_key”, and “gt_key”, added or modified keys are “lq_key” and “gt_key”.

参数

gt_patch_size (int) – cropped gt patch size.
lq_key (str) – Key of LQ img. Default: ‘img’.
gt_key (str) – Key of GT img. Default: ‘gt’.

transform(results)¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.RandomCropLongEdge(keys='img')[源代码]¶

Bases: mmcv.transforms.BaseTransform

Random crop the given image by the long edge.

参数: keys (list[str]) – The images to be cropped.

transform(results)¶

Call function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.RandomResizedCrop(keys, crop_size, scale=(0.08, 1.0), ratio=(3.0 / 4.0, 4.0 / 3.0), interpolation='bilinear')[源代码]¶

Bases: mmcv.transforms.BaseTransform

Crop data to random size and aspect ratio.

A crop of a random proportion of the original image and a random aspect ratio of the original aspect ratio is made. The cropped image is finally resized to a given size specified by ‘crop_size’. Modified keys are the attributes specified in “keys”.

This code is partially adopted from torchvision.transforms.RandomResizedCrop: [https://pytorch.org/vision/stable/_modules/torchvision/transforms/ transforms.html#RandomResizedCrop].

参数

keys (list[str]) – The images to be resized and random-cropped.
crop_size (int | tuple[int]) – Target spatial size (h, w).
scale (tuple[float], optional) – Range of the proportion of the original image to be cropped. Default: (0.08, 1.0).
ratio (tuple[float], optional) – Range of aspect ratio of the crop. Default: (3. / 4., 4. / 3.).
interpolation (str, optional) – Algorithm used for interpolation. It can be only either one of the following: “nearest” | “bilinear” | “bicubic” | “area” | “lanczos”. Default: “bilinear”.

get_params(data)¶

Get parameters for a random sized crop.

参数: data (np.ndarray) – Image of type numpy array to be cropped.
返回: A tuple containing the coordinates of the top left corner and the chosen crop size.

transform(results)¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.CompositeFg(fg_dirs, alpha_dirs, interpolation='nearest')[源代码]¶

Bases: mmcv.transforms.BaseTransform

Composite foreground with a random foreground.

This class composites the current training sample with additional data randomly (could be from the same dataset). With probability 0.5, the sample will be composited with a random sample from the specified directory. The composition is performed as:

\[ \begin{align}\begin{aligned}fg_{new} = \alpha_1 * fg_1 + (1 - \alpha_1) * fg_2\\\alpha_{new} = 1 - (1 - \alpha_1) * (1 - \alpha_2)\end{aligned}\end{align} \]

where \((fg_1, \alpha_1)\) is from the current sample and \((fg_2, \alpha_2)\) is the randomly loaded sample. With the above composition, \(\alpha_{new}\) is still in [0, 1].

Required keys are “alpha” and “fg”. Modified keys are “alpha” and “fg”.

参数

fg_dirs (str | list[str]) – Path of directories to load foreground images from.
alpha_dirs (str | list[str]) – Path of directories to load alpha mattes from.
interpolation (str) – Interpolation method of mmcv.imresize to resize the randomly loaded images. Default: ‘nearest’.

transform(results: dict) → dict¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

_get_file_list(fg_dirs, alpha_dirs)¶

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.MergeFgAndBg[源代码]¶

Bases: mmcv.transforms.BaseTransform

Composite foreground image and background image with alpha.

Required keys are “alpha”, “fg” and “bg”, added key is “merged”.

transform(results: dict) → dict¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__() → str¶: Return repr(self).

class mmagic.datasets.transforms.PerturbBg(gamma_ratio=0.6)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Randomly add gaussian noise or gamma change to background image.

Required key is “bg”, added key is “noisy_bg”.

参数: gamma_ratio (float, optional) – The probability to use gamma correction instead of gaussian noise. Defaults to 0.6.

transform(results: dict) → dict¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.RandomJitter(hue_range=40)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Randomly jitter the foreground in hsv space.

The jitter range of hue is adjustable while the jitter ranges of saturation and value are adaptive to the images. Side effect: the “fg” image will be converted to np.float32. Required keys are “fg” and “alpha”, modified key is “fg”.

参数: hue_range (float | tuple[float]) – Range of hue jittering. If it is a float instead of a tuple like (min, max), the range of hue jittering will be (-hue_range, +hue_range). Default: 40.

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.RandomLoadResizeBg(bg_dir, flag='color', channel_order='bgr')[源代码]¶

Bases: mmcv.transforms.BaseTransform

Randomly load a background image and resize it.

Required key is “fg”, added key is “bg”.

参数

bg_dir (str) – Path of directory to load background images from.
flag (str) – Loading flag for images. Default: ‘color’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
kwargs (dict) – Args for file client.

transform(results: dict) → dict¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.PackInputs(keys: Tuple[List[str], str] = ['merged', 'img'], meta_keys: Tuple[List[str], str] = [], data_keys: Tuple[List[str], str] = [])[源代码]¶

Bases: mmcv.transforms.base.BaseTransform

Pack data into DataSample for training, evaluation and testing.

MMagic follows the design of data structure from MMEngine.: Data from the loader will be packed into data field of DataSample. More details of DataSample refer to the documentation of MMEngine: https://mmengine.readthedocs.io/en/latest/advanced_tutorials/data_element.html

参数

Tuple[List[str] (meta_keys) – The keys to saved in returned inputs, which are used as the input of models, default to [‘img’, ‘noise’, ‘merged’].
str – The keys to saved in returned inputs, which are used as the input of models, default to [‘img’, ‘noise’, ‘merged’].
None] – The keys to saved in returned inputs, which are used as the input of models, default to [‘img’, ‘noise’, ‘merged’].
Tuple[List[str] – The keys to saved in data_field of the data_samples.
str – The keys to saved in data_field of the data_samples.
None] – The keys to saved in data_field of the data_samples.
Tuple[List[str] – The meta keys to saved in metainfo of the data_samples. All the other data will be packed into the data of the data_samples
str – The meta keys to saved in metainfo of the data_samples. All the other data will be packed into the data of the data_samples
None] – The meta keys to saved in metainfo of the data_samples. All the other data will be packed into the data of the data_samples

transform(results: dict) → dict¶

Method to pack the input data.

参数

results (dict) – Result dict from the data pipeline.

返回

A dict contains

’inputs’ (obj:dict): The forward data of models. According to different tasks, the inputs may contain images, videos, labels, text, etc.
’data_samples’ (obj:DataSample): The annotation info of the
sample.

返回类型

dict

__repr__() → str¶: Return repr(self).

class mmagic.datasets.transforms.GenerateCoordinateAndCell(sample_quantity=None, scale=None, target_size=None, reshape_gt=True)[源代码]¶

Bases: mmcv.transforms.base.BaseTransform

Generate coordinate and cell. Generate coordinate from the desired size of SR image.

Train or val:

Generate coordinate from GT.

#. Reshape GT image to (HgWg, 3) and transpose to (3, HgWg). where Hg and Wg represent the height and width of GT.

Test:

Generate coordinate from LQ and scale or target_size.
Then generate cell from coordinate.

参数

sample_quantity (int | None) – The quantity of samples in coordinates. To ensure that the GT tensors in a batch have the same dimensions. Default: None.
scale (float) – Scale of upsampling. Default: None.
target_size (tuple[int]) – Size of target image. Default: None.
reshape_gt (bool) – Whether reshape gt to (-1, 3). Default: True If sample_quantity is not None, reshape_gt = True.

The priority of getting ‘size of target image’ is:

results[‘gt’].shape[-2:]
results[‘lq’].shape[-2:] * scale
target_size

transform(results)¶

Call function.

参数

results (Require either in) – A dict containing the necessary information
augmentation. (and data for) –
results –
'lq' (1.) –
'gt' (2.) –
None (3.) –
and (the premise is self.target_size) –
len (self.target_size) –

返回

A dict containing the processed data and information. Reshape ‘gt’ to (-1, 3) and transpose to (3, -1) if ‘gt’ in results. Add ‘coord’ and ‘cell’.

返回类型

dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.GenerateFacialHeatmap(image_key, ori_size, target_size, sigma=1.0, use_cache=True)[源代码]¶

Bases: mmcv.transforms.base.BaseTransform

Generate heatmap from keypoint.

参数

image_key (str) – Key of facial image in dict.
ori_size (int | Tuple[int]) – Original image size of keypoint.
target_size (int | Tuple[int]) – Target size of heatmap.
sigma (float) – Sigma parameter of heatmap. Default: 1.0
use_cache (bool) – If True, load all heatmap at once. Default: True.

transform(results)¶

transform function.

参数

results (dict) – A dict containing the necessary information and data for augmentation. Require keypoint.

返回

A dict containing the processed data and information.: Add ‘heatmap’.

返回类型

dict

generate_heatmap_from_img(image)¶

Generate heatmap from img.

参数: image (np.ndarray) – Face image.

results:: heatmap (np.ndarray): Heatmap the face image.

_face_alignment_detector(image)¶

Generate face landmark by face_alignment.

参数: image (np.ndarray) – Face image.
返回: Location of landmark.
返回类型: landmark (Tuple[float])

_generate_one_heatmap(keypoint)¶

Generate One Heatmap.

参数: keypoint (Tuple[float]) – Location of a landmark.

results:: heatmap (np.ndarray): A heatmap of landmark.

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.GenerateFrameIndices(interval_list, frames_per_clip=99)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Generate frame index for REDS datasets. It also performs temporal augmentation with random interval.

Required Keys:

img_path
gt_path
key
num_input_frames

Modified Keys:

img_path
gt_path

Added Keys:

interval
reverse

参数

interval_list (list[int]) – Interval list for temporal augmentation. It will randomly pick an interval from interval_list and sample frame index with the interval.
frames_per_clip (int) – Number of frames per clips. Default: 99 for REDS dataset.

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.GenerateFrameIndiceswithPadding(padding, filename_tmpl='{:08d}')[源代码]¶

Bases: mmcv.transforms.BaseTransform

Generate frame index with padding for REDS dataset and Vid4 dataset during testing.

Required Keys:

img_path
gt_path
key
num_input_frames
sequence_length

Modified Keys:

img_path
gt_path

参数

padding –

padding mode, one of ‘replicate’ | ‘reflection’ | ‘reflection_circle’ | ‘circle’.

Examples: current_idx = 0, num_input_frames = 5 The generated frame indices under different padding mode:

replicate: [0, 0, 0, 1, 2] reflection: [2, 1, 0, 1, 2] reflection_circle: [4, 3, 0, 1, 2] circle: [3, 4, 0, 1, 2]

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.GenerateSegmentIndices(interval_list, start_idx=0, filename_tmpl='{:08d}.png')[源代码]¶

Bases: mmcv.transforms.BaseTransform

Generate frame indices for a segment. It also performs temporal augmentation with random interval.

Required Keys:

img_path
gt_path
key
num_input_frames
sequence_length

Modified Keys:

img_path
gt_path

Added Keys:

interval
reverse

参数

interval_list (list[int]) – Interval list for temporal augmentation. It will randomly pick an interval from interval_list and sample frame index with the interval.
start_idx (int) – The index corresponds to the first frame in the sequence. Default: 0.
filename_tmpl (str) – Template for file name. Default: ‘{:08d}.png’.

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.GetMaskedImage(img_key='gt', mask_key='mask', out_key='img', zero_value=127.5)[源代码]¶

Bases: mmcv.transforms.base.BaseTransform

Get masked image.

参数

img_key (str) – Key for clean image. Default: ‘gt’.
mask_key (str) – Key for mask image. The mask shape should be (h, w, 1) while ‘1’ indicate holes and ‘0’ indicate valid regions. Default: ‘mask’.
img_key – Key for output image. Default: ‘img’.
zero_value (float) – Pixel value of masked area.

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.GetSpatialDiscountMask(gamma=0.99, beta=1.5)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Get spatial discounting mask constant.

Spatial discounting mask is first introduced in: Generative Image Inpainting with Contextual Attention.

参数

gamma (float, optional) – Gamma for computing spatial discounting. Defaults to 0.99.
beta (float, optional) – Beta for computing spatial discounting. Defaults to 1.5.

spatial_discount_mask(mask_width, mask_height)¶

Generate spatial discounting mask constant.

参数

mask_width (int) – The width of bbox hole.
mask_height (int) – The height of bbox height.

返回

Spatial discounting mask.

返回类型

np.ndarray

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.LoadImageFromFile(key: str, color_type: str = 'color', channel_order: str = 'bgr', imdecode_backend: Optional[str] = None, use_cache: bool = False, to_float32: bool = False, to_y_channel: bool = False, save_original_img: bool = False, backend_args: Optional[dict] = None)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Load a single image or image frames from corresponding paths. Required Keys: - [Key]_path

New Keys: - [KEY] - ori_[KEY]_shape - ori_[KEY]

参数

key (str) – Keys in results to find corresponding path.
color_type (str) – The flag argument for :func:mmcv.imfrombytes. Defaults to ‘color’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :func:mmcv.imfrombytes for details. candidates are ‘cv2’, ‘turbojpeg’, ‘pillow’, and ‘tifffile’. Defaults to None.
use_cache (bool) – If True, load all images at once. Default: False.
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
to_y_channel (bool) – Whether to convert the loaded image to y channel. Only support ‘rgb2ycbcr’ and ‘rgb2ycbcr’ Defaults to False.
backend_args (dict, optional) – Arguments to instantiate the prefix of uri corresponding backend. Defaults to None.

transform(results: dict) → dict¶

Functions to load image or frames.

参数: results (dict) – Result dict from :obj:mmcv.BaseDataset.
返回: The dict contains loaded image and meta information.
返回类型: dict

_load_image(filename)¶

Load an image from file.

参数: filename (str) – Path of image file.
返回: Image.
返回类型: np.ndarray

_convert(img: numpy.ndarray)¶

Convert an image to the require format.

参数: img (np.ndarray) – The original image.
返回: The converted image.
返回类型: np.ndarray

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.LoadMask(mask_mode='bbox', mask_config=None)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Load Mask for multiple types.

For different types of mask, users need to provide the corresponding config dict.

Example config for bbox:

config = dict(img_shape=(256, 256), max_bbox_shape=128)

Example config for irregular:

config = dict(
    img_shape=(256, 256),
    num_vertices=(4, 12),
    max_angle=4.,
    length_range=(10, 100),
    brush_width=(10, 40),
    area_ratio_range=(0.15, 0.5))

Example config for ff:

config = dict(
    img_shape=(256, 256),
    num_vertices=(4, 12),
    mean_angle=1.2,
    angle_range=0.4,
    brush_width=(12, 40))

Example config for set:

config = dict(
    mask_list_file='xxx/xxx/ooxx.txt',
    prefix='/xxx/xxx/ooxx/',
    io_backend='local',
    color_type='unchanged',
    file_client_kwargs=dict()
)

The mask_list_file contains the list of mask file name like this:
    test1.jpeg
    test2.jpeg
    ...
    ...

The prefix gives the data path.

参数

mask_mode (str) – Mask mode in [‘bbox’, ‘irregular’, ‘ff’, ‘set’, ‘file’]. Default: ‘bbox’. * bbox: square bounding box masks. * irregular: irregular holes. * ff: free-form holes from DeepFillv2. * set: randomly get a mask from a mask set. * file: get mask from ‘mask_path’ in results.
mask_config (dict) – Params for creating masks. Each type of mask needs different configs. Default: None.

_init_info()¶

_get_random_mask_from_set()¶

_get_mask_from_file(path)¶

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.LoadPairedImageFromFile(key: str, domain_a: str = 'A', domain_b: str = 'B', color_type: str = 'color', channel_order: str = 'bgr', imdecode_backend: Optional[str] = None, use_cache: bool = False, to_float32: bool = False, to_y_channel: bool = False, save_original_img: bool = False, backend_args: Optional[dict] = None)[源代码]¶

Bases: LoadImageFromFile

Load a pair of images from file.

Each sample contains a pair of images, which are concatenated in the w dimension (a|b). This is a special loading class for generation paired dataset. It loads a pair of images as the common loader does and crops it into two images with the same shape in different domains.

Required key is “pair_path”. Added or modified keys are “pair”, “pair_ori_shape”, “ori_pair”, “img_{domain_a}”, “img_{domain_b}”, “img_{domain_a}_path”, “img_{domain_b}_path”, “img_{domain_a}_ori_shape”, “img_{domain_b}_ori_shape”, “ori_img_{domain_a}” and “ori_img_{domain_b}”.

参数

key (str) – Keys in results to find corresponding path.
domain_a (str, Optional) – One of the paired image domain. Defaults to ‘A’.
domain_b (str, Optional) – The other of the paired image domain. Defaults to ‘B’.
color_type (str) – The flag argument for :func:mmcv.imfrombytes. Defaults to ‘color’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :func:mmcv.imfrombytes for details. candidates are ‘cv2’, ‘turbojpeg’, ‘pillow’, and ‘tifffile’. Defaults to None.
use_cache (bool) – If True, load all images at once. Default: False.
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
to_y_channel (bool) – Whether to convert the loaded image to y channel. Only support ‘rgb2ycbcr’ and ‘rgb2ycbcr’ Defaults to False.
backend_args (dict, optional) – Arguments to instantiate the prefix of uri corresponding backend. Defaults to None.
io_backend (str, optional) – io backend where images are store. Defaults to None.

transform(results: dict) → dict¶

Functions to load paired images.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

class mmagic.datasets.transforms.MATLABLikeResize(keys, scale=None, output_shape=None, kernel='bicubic', kernel_width=4.0)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Resize the input image using MATLAB-like downsampling.

Currently support bicubic interpolation only. Note that the output of this function is slightly different from the official MATLAB function.

Required keys are the keys in attribute “keys”. Added or modified keys are “scale” and “output_shape”, and the keys in attribute “keys”.

参数

keys (list[str]) – A list of keys whose values are modified.
scale (float | None, optional) – The scale factor of the resize operation. If None, it will be determined by output_shape. Default: None.
output_shape (tuple(int) | None, optional) – The size of the output image. If None, it will be determined by scale. Note that if scale is provided, output_shape will not be used. Default: None.
kernel (str, optional) – The kernel for the resize operation. Currently support ‘bicubic’ only. Default: ‘bicubic’.
kernel_width (float) – The kernel width. Currently support 4.0 only. Default: 4.0.

_resize(img)¶

resize an image to the require size.

参数: img (np.ndarray) – The original image.
返回: The resized image.
返回类型: output (np.ndarray)

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.Normalize(keys, mean, std, to_rgb=False, save_original=False)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Normalize images with the given mean and std value.

Required keys are the keys in attribute “keys”, added or modified keys are the keys in attribute “keys” and these keys with postfix ‘_norm_cfg’. It also supports normalizing a list of images.

参数

keys (Sequence[str]) – The images to be normalized.
mean (np.ndarray) – Mean values of different channels.
std (np.ndarray) – Std values of different channels.
to_rgb (bool) – Whether to convert channels from BGR to RGB. Default: False.
save_original (bool) – Whether to save original images. Default: False.

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.RescaleToZeroOne(keys)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Transform the images into a range between 0 and 1.

Required keys are the keys in attribute “keys”, added or modified keys are the keys in attribute “keys”. It also supports rescaling a list of images.

参数: keys (Sequence[str]) – The images to be transformed.

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.DegradationsWithShuffle(degradations, keys, shuffle_idx=None)[源代码]¶

Apply random degradations to input, with degradations being shuffled.

Degradation groups are supported. The order of degradations within the same group is preserved. For example, if we have degradations = [a, b, [c, d]] and shuffle_idx = None, then the possible orders are

[a, b, [c, d]]
[a, [c, d], b]
[b, a, [c, d]]
[b, [c, d], a]
[[c, d], a, b]
[[c, d], b, a]

Modified keys are the attributed specified in “keys”.

参数

degradations (list[dict]) – The list of degradations.
keys (list[str]) – A list specifying the keys whose values are modified.
shuffle_idx (list | None, optional) – The degradations corresponding to these indices are shuffled. If None, all degradations are shuffled. Default: None.

_build_degradations(degradations)¶

__call__(results)¶: Call this transform.

__repr__()¶: Print the basic information of the transform.

class mmagic.datasets.transforms.RandomBlur(params, keys)[源代码]¶

Apply random blur to the input.

Modified keys are the attributed specified in “keys”.

参数

params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.

get_kernel(num_kernels: int)¶

This is the function to create kernel.

参数: num_kernels (int) – the number of kernels
返回: _description_
返回类型: _type_

_apply_random_blur(imgs)¶

This is the function to apply blur operation on images.

参数: imgs (Tensor) – images
返回: Images applied blur
返回类型: Tensor

__call__(results)¶: Call this transform.

__repr__()¶: Print the basic information of the transform.

class mmagic.datasets.transforms.RandomJPEGCompression(params, keys, color_type='color', bgr2rgb=False)[源代码]¶

Apply random JPEG compression to the input.

Modified keys are the attributed specified in “keys”.

参数

params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.
bgr2rgb (str) – Whether change channel order. Default: False.

_apply_random_compression(imgs)¶

__call__(results)¶: Call this transform.

__repr__()¶: Print the basic information of the transform.

class mmagic.datasets.transforms.RandomNoise(params, keys)[源代码]¶

Apply random noise to the input.

Currently support Gaussian noise and Poisson noise.

Modified keys are the attributed specified in “keys”.

参数

params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.

_apply_gaussian_noise(imgs)¶

This is the function used to apply gaussian noise on images.

参数: imgs (Tensor) – images
返回: images applied gaussian noise
返回类型: Tensor

_apply_poisson_noise(imgs)¶

_apply_random_noise(imgs)¶

This is the function used to apply random noise on images.

参数: imgs (Tensor) – training images
返回: _description_
返回类型: _type_

__call__(results)¶: Call this transform.

__repr__()¶: Print the basic information of the transform.

class mmagic.datasets.transforms.RandomResize(params, keys)[源代码]¶

Randomly resize the input.

Modified keys are the attributed specified in “keys”.

参数

params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.

_random_resize(imgs)¶

This is the function used to randomly resize images for training augmentation.

参数: imgs (Tensor) – training images.
返回: images after randomly resized
返回类型: Tensor

__call__(results)¶: Call this transform.

__repr__()¶: Print the basic information of the transform.

class mmagic.datasets.transforms.RandomVideoCompression(params, keys)[源代码]¶

Apply random video compression to the input.

Modified keys are the attributed specified in “keys”.

参数

params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.

_apply_random_compression(imgs)¶

This is the function to apply random compression on images.

参数: imgs (Tensor) – training images
返回: images after randomly compressed
返回类型: Tensor

__call__(results)¶: Call this transform.

__repr__()¶: Print the basic information of the transform.

class mmagic.datasets.transforms.RandomDownSampling(scale_min=1.0, scale_max=4.0, patch_size=None, interpolation='bicubic', backend='pillow')[源代码]¶

Bases: mmcv.transforms.BaseTransform

Generate LQ image from GT (and crop), which will randomly pick a scale.

参数

scale_min (float) – The minimum of upsampling scale, inclusive. Default: 1.0.
scale_max (float) – The maximum of upsampling scale, exclusive. Default: 4.0.
patch_size (int) – The cropped lr patch size. Default: None, means no crop.
interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear”, “bicubic”, “box”, “lanczos”, “hamming” for ‘pillow’ backend. Default: “bicubic”.
backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: “pillow”.
[scale_min (Scale will be picked in the range of) –
scale_max). –

transform(results)¶

transform function.

参数

results (dict) – A dict containing the necessary information and data for augmentation. ‘gt’ is required.

返回

A dict containing the processed data and information.: modified ‘gt’, supplement ‘lq’ and ‘scale’ to keys.

返回类型

dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.FormatTrimap(to_onehot=False)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Convert trimap (tensor) to one-hot representation.

It transforms the trimap label from (0, 128, 255) to (0, 1, 2). If to_onehot is set to True, the trimap will convert to one-hot tensor of shape (3, H, W). Required key is “trimap”, added or modified key are “trimap” and “format_trimap_to_onehot”.

参数: to_onehot (bool) – whether convert trimap to one-hot tensor. Default: False.

transform(results)¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.GenerateTrimap(kernel_size, iterations=1, random=True)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Using random erode/dilate to generate trimap from alpha matte.

Required key is “alpha”, added key is “trimap”.

参数

kernel_size (int | tuple[int]) – The range of random kernel_size of erode/dilate; int indicates a fixed kernel_size. If random is set to False and kernel_size is a tuple of length 2, then it will be interpreted as (erode kernel_size, dilate kernel_size). It should be noted that the kernel of the erosion and dilation has the same height and width.
iterations (int | tuple[int], optional) – The range of random iterations of erode/dilate; int indicates a fixed iterations. If random is set to False and iterations is a tuple of length 2, then it will be interpreted as (erode iterations, dilate iterations). Default to 1.
random (bool, optional) – Whether use random kernel_size and iterations when generating trimap. See kernel_size and iterations for more information. Default to True.

transform(results: dict) → dict¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.GenerateTrimapWithDistTransform(dist_thr=20, random=True)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Generate trimap with distance transform function.

参数

dist_thr (int, optional) – Distance threshold. Area with alpha value between (0, 255) will be considered as initial unknown area. Then area with distance to unknown area smaller than the distance threshold will also be consider as unknown area. Defaults to 20.
random (bool, optional) – If True, use random distance threshold from [1, dist_thr). If False, use dist_thr as the distance threshold directly. Defaults to True.

transform(results: dict) → dict¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.TransformTrimap[源代码]¶

Bases: mmcv.transforms.BaseTransform

Transform trimap into two-channel and six-channel.

This class will generate a two-channel trimap composed of definite foreground and background masks and encode it into a six-channel trimap using Gaussian blurs of the generated two-channel trimap at three different scales. The transformed trimap has 6 channels.

Required key is “trimap”, added key is “transformed_trimap” and “two_channel_trimap”.

Adopted from the following repository: https://github.com/MarcoForte/FBA_Matting/blob/master/networks/transforms.py.

transform(results: dict) → dict¶

Transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict containing the processed data and information.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.CopyValues(src_keys, dst_keys)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Copy the value of source keys to destination keys.

# TODO Change to dict(dst=src)

It does the following: results[dst_key] = results[src_key] for (src_key, dst_key) in zip(src_keys, dst_keys).

Added keys are the keys in the attribute “dst_keys”.

Required Keys:

[SRC_KEYS]

Added Keys:

[DST_KEYS]

参数

src_keys (list[str]) – The source keys.
dst_keys (list[str]) – The destination keys.

transform(results)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict with a key added/modified.
返回类型: dict

__repr__()¶: Return repr(self).

class mmagic.datasets.transforms.SetValues(dictionary)[源代码]¶

Bases: mmcv.transforms.BaseTransform

Set value to destination keys.

It does the following: results[key] = value

Added keys are the keys in the dictionary.

Required Keys:

None

Added or Modified Keys:

keys in the dictionary

参数: dictionary (dict) – The dictionary to update.

transform(results: Dict)¶

transform function.

参数: results (dict) – A dict containing the necessary information and data for augmentation.
返回: A dict with a key added/modified.
返回类型: dict

__repr__()¶: Return repr(self).

mmagic.datasets.transforms¶

Package Contents¶

Classes¶

`mmagic.datasets.transforms`¶