mmagic.datasets.transforms
¶
Package Contents¶
Classes¶
AlbuCorruptFunction augmentation. |
|
PairedAlbuTransForms augmentation. |
|
Albumentation augmentation. |
|
Generate segmentation mask from alpha matte. |
|
Generate soft segmentation mask from input segmentation mask. |
|
Extend short sequences (e.g. Vimeo-90K) by mirroring the sequences. |
|
Reverse frame lists for temporal augmentation. |
|
Binarize image. |
|
Clip the pixels. |
|
An interface for torch color jitter so that it can be invoked in mmagic |
|
Apply random affine to input images. |
|
Randomly dilate binary masks. |
|
Apply unsharp masking to an image or a sequence of images. |
|
Flip the input data with a probability. |
|
Numpy Padding. |
|
Rotate the image by a randomly-chosen angle, measured in degree. |
|
Randomly transpose images in H and W dimensions with a probability. |
|
Resize data to a specific size for training or resize the images to fit |
|
Center crop the given image by the long edge. |
|
Crop data to specific size for training. |
|
Randomly crop the images around unknown area in the center 1/4 images. |
|
Crop around the whole foreground in the segmentation mask. |
|
Crop around unknown area with a randomly selected scale. |
|
Crop/pad the image in the target_key according to the size of image in |
|
Crop paired data (at a specific position) to specific size for training. |
|
Use maskrcnn to detect instances on image. |
|
Mod crop images, used during testing. |
|
Paired random crop. |
|
Random crop the given image by the long edge. |
|
Crop data to random size and aspect ratio. |
|
Composite foreground with a random foreground. |
|
Composite foreground image and background image with alpha. |
|
Randomly add gaussian noise or gamma change to background image. |
|
Randomly jitter the foreground in hsv space. |
|
Randomly load a background image and resize it. |
|
Pack data into DataSample for training, evaluation and testing. |
|
Generate coordinate and cell. Generate coordinate from the desired size |
|
Generate heatmap from keypoint. |
|
Generate frame index for REDS datasets. It also performs temporal |
|
Generate frame index with padding for REDS dataset and Vid4 dataset |
|
Generate frame indices for a segment. It also performs temporal |
|
Get masked image. |
|
Get spatial discounting mask constant. |
|
Load a single image or image frames from corresponding paths. Required |
|
Load Mask for multiple types. |
|
Load a pair of images from file. |
|
Resize the input image using MATLAB-like downsampling. |
|
Normalize images with the given mean and std value. |
|
Transform the images into a range between 0 and 1. |
|
Apply random degradations to input, with degradations being shuffled. |
|
Apply random blur to the input. |
|
Apply random JPEG compression to the input. |
|
Apply random noise to the input. |
|
Randomly resize the input. |
|
Apply random video compression to the input. |
|
Generate LQ image from GT (and crop), which will randomly pick a scale. |
|
Convert trimap (tensor) to one-hot representation. |
|
Using random erode/dilate to generate trimap from alpha matte. |
|
Generate trimap with distance transform function. |
|
Transform trimap into two-channel and six-channel. |
|
Copy the value of source keys to destination keys. |
|
Set value to destination keys. |
- class mmagic.datasets.transforms.AlbuCorruptFunction(keys: List[str], config: List[dict], p: float = 1.0)[source]¶
Bases:
mmcv.transforms.BaseTransform
AlbuCorruptFunction augmentation.
Apply the same AlbuCorruptFunction augmentation to the input images.
- class mmagic.datasets.transforms.PairedAlbuTransForms(size: int, lq_key: str = 'img', gt_key: str = 'gt', scope: str = 'geometric', crop: str = 'random', p: float = 0.5)[source]¶
Bases:
mmcv.transforms.BaseTransform
PairedAlbuTransForms augmentation.
Apply the same AlbuTransforms augmentation to paired images.
- class mmagic.datasets.transforms.Albumentations(keys: List[str], transforms: List[dict])[source]¶
Bases:
mmcv.transforms.BaseTransform
Albumentation augmentation.
Adds custom transformations from Albumentations library. Please, visit https://github.com/albumentations-team/albumentations and https://albumentations.ai/docs/getting_started/transforms_and_targets to get more information.
An example of
transforms
is as followed:albu_transforms = [ dict( type='Resize', height=100, width=100, ), dict( type='RandomFog', p=0.5, ), dict( type='RandomRain', p=0.5 ), dict( type='RandomSnow', p=0.5, ), ] pipeline = [ dict( type='LoadImageFromFile', key='img', color_type='color', channel_order='rgb', imdecode_backend='cv2'), dict( type='Albumentations', keys=['img'], transforms=albu_transforms), dict(type='PackInputs') ]
- Parameters
keys (list[str]) – A list specifying the keys whose values are modified.
transforms (list[dict]) – A list of albu transformations.
- class mmagic.datasets.transforms.GenerateSeg(kernel_size=5, erode_iter_range=(10, 20), dilate_iter_range=(15, 30), num_holes_range=(0, 3), hole_sizes=[(15, 15), (25, 25), (35, 35), (45, 45)], blur_ksizes=[(21, 21), (31, 31), (41, 41)])[source]¶
Bases:
mmcv.transforms.BaseTransform
Generate segmentation mask from alpha matte.
- Parameters
kernel_size (int, optional) – Kernel size for both erosion and dilation. The kernel will have the same height and width. Defaults to 5.
erode_iter_range (tuple, optional) – Iteration of erosion. Defaults to (10, 20).
dilate_iter_range (tuple, optional) – Iteration of dilation. Defaults to (15, 30).
num_holes_range (tuple, optional) – Range of number of holes to randomly select from. Defaults to (0, 3).
hole_sizes (list, optional) – List of (h, w) to be selected as the size of the rectangle hole. Defaults to [(15, 15), (25, 25), (35, 35), (45, 45)].
blur_ksizes (list, optional) – List of (h, w) to be selected as the kernel_size of the gaussian blur. Defaults to [(21, 21), (31, 31), (41, 41)].
- static _crop_hole(img, start_point, hole_size)[source]¶
Create a all-zero rectangle hole in the image.
- Parameters
img (np.ndarray) – Source image.
start_point (tuple[int]) – The top-left point of the rectangle.
hole_size (tuple[int]) – The height and width of the rectangle hole.
- Returns
The cropped image.
- Return type
np.ndarray
- class mmagic.datasets.transforms.GenerateSoftSeg(fg_thr=0.2, border_width=25, erode_ksize=3, dilate_ksize=5, erode_iter_range=(10, 20), dilate_iter_range=(3, 7), blur_ksizes=[(21, 21), (31, 31), (41, 41)])[source]¶
Bases:
mmcv.transforms.BaseTransform
Generate soft segmentation mask from input segmentation mask.
Required key is “seg”, added key is “soft_seg”.
- Parameters
fg_thr (float, optional) – Threshold of the foreground in the normalized input segmentation mask. Defaults to 0.2.
border_width (int, optional) – Width of border to be padded to the bottom of the mask. Defaults to 25.
erode_ksize (int, optional) – Fixed kernel size of the erosion. Defaults to 5.
dilate_ksize (int, optional) – Fixed kernel size of the dilation. Defaults to 5.
erode_iter_range (tuple, optional) – Iteration of erosion. Defaults to (10, 20).
dilate_iter_range (tuple, optional) – Iteration of dilation. Defaults to (3, 7).
blur_ksizes (list, optional) – List of (h, w) to be selected as the kernel_size of the gaussian blur. Defaults to [(21, 21), (31, 31), (41, 41)].
- class mmagic.datasets.transforms.MirrorSequence(keys)[source]¶
Bases:
mmcv.transforms.BaseTransform
Extend short sequences (e.g. Vimeo-90K) by mirroring the sequences.
Given a sequence with N frames (x1, …, xN), extend the sequence to (x1, …, xN, xN, …, x1).
Required Keys:
[KEYS]
Modified Keys:
[KEYS]
- Parameters
keys (list[str]) – The frame lists to be extended.
- class mmagic.datasets.transforms.TemporalReverse(keys, reverse_ratio=0.5)[source]¶
Bases:
mmcv.transforms.BaseTransform
Reverse frame lists for temporal augmentation.
Required keys are the keys in attributes “lq” and “gt”, added or modified keys are “lq”, “gt” and “reverse”.
- Parameters
keys (list[str]) – The frame lists to be reversed.
reverse_ratio (float) – The probability to reverse the frame lists. Default: 0.5.
- class mmagic.datasets.transforms.BinarizeImage(keys, binary_thr, a_min=0, a_max=1, dtype=np.uint8)[source]¶
Bases:
mmcv.transforms.BaseTransform
Binarize image.
- Parameters
keys (Sequence[str]) – The images to be binarized.
binary_thr (float) – Threshold for binarization.
a_min (int) – Lower limits of pixel value.
a_max (int) – Upper limits of pixel value.
dtype (np.dtype) – Set the data type of the output. Default: np.uint8
- _binarize(img)[source]¶
Binarize image.
- Parameters
img (np.ndarray) – Input image.
- Returns
Output image.
- Return type
img (np.ndarray)
- class mmagic.datasets.transforms.Clip(keys, a_min=0, a_max=255)[source]¶
Bases:
mmcv.transforms.BaseTransform
Clip the pixels.
Modified keys are the attributes specified in “keys”.
- Parameters
keys (list[str]) – The keys whose values are clipped.
a_min (int) – Lower limits of pixel value.
a_max (int) – Upper limits of pixel value.
- _clip(input_)[source]¶
Clip the pixels.
- Parameters
input (Union[List, np.ndarray]) – Pixels to clip.
- Returns
Clipped pixels.
- Return type
Union[List, np.ndarray]
- class mmagic.datasets.transforms.ColorJitter(keys, channel_order='rgb', **kwargs)[source]¶
Bases:
mmcv.transforms.BaseTransform
An interface for torch color jitter so that it can be invoked in mmagic pipeline.
Randomly change the brightness, contrast and saturation of an image. Modified keys are the attributes specified in “keys”.
Required Keys:
[KEYS]
Modified Keys:
[KEYS]
- Parameters
keys (list[str]) – The images to be resized.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘rgb’.
Notes
**kwards
follows the args list oftorchvision.transforms.ColorJitter
.- brightness (float or tuple of float (min, max)): How much to jitter
brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.
- contrast (float or tuple of float (min, max)): How much to jitter
contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.
- saturation (float or tuple of float (min, max)): How much to jitter
saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.
- hue (float or tuple of float (min, max)): How much to jitter hue.
hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.
- _color_jitter(image, this_seed)[source]¶
Color Jitter Function.
- Parameters
image (np.ndarray) – Image.
this_seed (int) – Seed of torch.
- Returns
The output image.
- Return type
image (np.ndarray)
- class mmagic.datasets.transforms.RandomAffine(keys, degrees, translate=None, scale=None, shear=None, flip_ratio=None)[source]¶
Bases:
mmcv.transforms.BaseTransform
Apply random affine to input images.
This class is adopted from https://github.com/pytorch/vision/blob/v0.5.0/torchvision/transforms/ transforms.py#L1015 It should be noted that in https://github.com/Yaoyi-Li/GCA-Matting/blob/master/dataloader/ data_generator.py#L70 random flip is added. See explanation of flip_ratio below. Required keys are the keys in attribute “keys”, modified keys are keys in attribute “keys”.
- Parameters
keys (Sequence[str]) – The images to be affined.
degrees (float | tuple[float]) – Range of degrees to select from. If it is a float instead of a tuple like (min, max), the range of degrees will be (-degrees, +degrees). Set to 0 to deactivate rotations.
translate (tuple, optional) – Tuple of maximum absolute fraction for horizontal and vertical translations. For example translate=(a, b), then horizontal shift is randomly sampled in the range -img_width * a < dx < img_width * a and vertical shift is randomly sampled in the range -img_height * b < dy < img_height * b. Default: None.
scale (tuple, optional) – Scaling factor interval, e.g (a, b), then scale is randomly sampled from the range a <= scale <= b. Default: None.
shear (float | tuple[float], optional) – Range of shear degrees to select from. If shear is a float, a shear parallel to the x axis and a shear parallel to the y axis in the range (-shear, +shear) will be applied. Else if shear is a tuple of 2 values, a x-axis shear and a y-axis shear in (shear[0], shear[1]) will be applied. Default: None.
flip_ratio (float, optional) – Probability of the image being flipped. The flips in horizontal direction and vertical direction are independent. The image may be flipped in both directions. Default: None.
- static _get_params(degrees, translate, scale_ranges, shears, flip_ratio, img_size)[source]¶
Get parameters for affine transformation.
- Returns
Params to be passed to the affine transformation.
- Return type
paras (tuple)
- static _get_inverse_affine_matrix(center, angle, translate, scale, shear, flip)[source]¶
Helper method to compute inverse matrix for affine transformation.
As it is explained in PIL.Image.rotate, we need compute INVERSE of affine transformation matrix: M = T * C * RSS * C^-1 where T is translation matrix:
[1, 0, tx | 0, 1, ty | 0, 0, 1];
- C is translation matrix to keep center:
[1, 0, cx | 0, 1, cy | 0, 0, 1];
RSS is rotation with scale and shear matrix.
It is different from the original function in torchvision. 1. The order are changed to flip -> scale -> rotation -> shear. 2. x and y have different scale factors. RSS(shear, a, scale, f) =
[ cos(a + shear)*scale_x*f -sin(a + shear)*scale_y 0] [ sin(a)*scale_x*f cos(a)*scale_y 0] [ 0 0 1]
Thus, the inverse is M^-1 = C * RSS^-1 * C^-1 * T^-1.
- class mmagic.datasets.transforms.RandomMaskDilation(keys, binary_thr=0.0, kernel_min=9, kernel_max=49)[source]¶
Bases:
mmcv.transforms.BaseTransform
Randomly dilate binary masks.
- Parameters
keys (Sequence[str]) – The images to be resized.
binary_thr (float) – Threshold for obtaining binary mask. Default: 0.
kernel_min (int) – Min size of dilation kernel. Default: 9.
kernel_max (int) – Max size of dilation kernel. Default: 49.
- class mmagic.datasets.transforms.UnsharpMasking(kernel_size, sigma, weight, threshold, keys)[source]¶
Bases:
mmcv.transforms.BaseTransform
Apply unsharp masking to an image or a sequence of images.
- Parameters
kernel_size (int) – The kernel_size of the Gaussian kernel.
sigma (float) – The standard deviation of the Gaussian.
weight (float) – The weight of the “details” in the final output.
threshold (float) – Pixel differences larger than this value are regarded as “details”.
keys (list[str]) – The keys whose values are processed.
Added keys are “xxx_unsharp”, where “xxx” are the attributes specified in “keys”.
- class mmagic.datasets.transforms.Flip(keys, flip_ratio=0.5, direction='horizontal')[source]¶
Bases:
mmcv.transforms.BaseTransform
Flip the input data with a probability.
Reverse the order of elements in the given data with a specific direction. The shape of the data is preserved, but the elements are reordered. Required keys are the keys in attributes “keys”, added or modified keys are “flip”, “flip_direction” and the keys in attributes “keys”. It also supports flipping a list of images with the same flip.
Required Keys:
[KEYS]
Modified Keys:
[KEYS]
- Parameters
keys (Union[str, List[str]]) – The images to be flipped.
flip_ratio (float) – The probability to flip the images. Default: 0.5.
direction (str) – Flip images horizontally or vertically. Options are “horizontal” | “vertical”. Default: “horizontal”.
- _directions = ['horizontal', 'vertical']¶
- class mmagic.datasets.transforms.NumpyPad(keys, padding, **kwargs)[source]¶
Bases:
mmcv.transforms.BaseTransform
Numpy Padding.
In this augmentation, numpy padding is adopted to customize padding augmentation. Please carefully read the numpy manual in: https://numpy.org/doc/stable/reference/generated/numpy.pad.html
If you just hope a single dimension to be padded, you must set
padding
like this:padding = ((2, 2), (0, 0), (0, 0))
In this case, if you adopt an input with three dimension, only the first dimension will be padded.
- Parameters
keys (Union[str, List[str]]) – The images to be padded.
padding (int | tuple(int)) – Please refer to the args
pad_width
innumpy.pad
.
- class mmagic.datasets.transforms.RandomRotation(keys, degrees)[source]¶
Bases:
mmcv.transforms.BaseTransform
Rotate the image by a randomly-chosen angle, measured in degree.
- Parameters
keys (list[str]) – The images to be rotated.
degrees (tuple[float] | tuple[int] | float | int) – If it is a tuple, it represents a range (min, max). If it is a float or int, the range is constructed as (-degrees, degrees).
- class mmagic.datasets.transforms.RandomTransposeHW(keys, transpose_ratio=0.5)[source]¶
Bases:
mmcv.transforms.BaseTransform
Randomly transpose images in H and W dimensions with a probability.
(TransposeHW = horizontal flip + anti-clockwise rotation by 90 degrees) When used with horizontal/vertical flips, it serves as a way of rotation augmentation. It also supports randomly transposing a list of images.
Required keys are the keys in attributes “keys”, added or modified keys are “transpose” and the keys in attributes “keys”.
- Parameters
keys (list[str]) – The images to be transposed.
transpose_ratio (float) – The probability to transpose the images. Default: 0.5.
- class mmagic.datasets.transforms.Resize(keys: Union[str, List[str]] = 'img', scale=None, keep_ratio=False, size_factor=None, max_size=None, interpolation='bilinear', backend=None, output_keys=None)[source]¶
Bases:
mmcv.transforms.BaseTransform
Resize data to a specific size for training or resize the images to fit the network input regulation for testing.
When used for resizing images to fit network input regulation, the case is that a network may have several downsample and then upsample operation, then the input height and width should be divisible by the downsample factor of the network. For example, the network would downsample the input for 5 times with stride 2, then the downsample factor is 2^5 = 32 and the height and width should be divisible by 32.
Required keys are the keys in attribute “keys”, added or modified keys are “keep_ratio”, “scale_factor”, “interpolation” and the keys in attribute “keys”.
Required Keys:
Required keys are the keys in attribute “keys”
Modified Keys:
Modified the keys in attribute “keys” or save as new key ([OUT_KEY])
Added Keys:
[OUT_KEY]_shape
keep_ratio
scale_factor
interpolation
All keys in “keys” should have the same shape. “test_trans” is used to record the test transformation to align the input’s shape.
- Parameters
keys (str | list[str]) – The image(s) to be resized.
scale (float | tuple[int]) – If scale is tuple[int], target spatial size (h, w). Otherwise, target spatial size is scaled by input size. Note that when it is used, size_factor and max_size are useless. Default: None
keep_ratio (bool) – If set to True, images will be resized without changing the aspect ratio. Otherwise, it will resize images to a given size. Default: False. Note that it is used together with scale.
size_factor (int) – Let the output shape be a multiple of size_factor. Default:None. Note that when it is used, scale should be set to None and keep_ratio should be set to False.
max_size (int) – The maximum size of the longest side of the output. Default:None. Note that it is used together with size_factor.
interpolation (str) – Algorithm used for interpolation: “nearest” | “bilinear” | “bicubic” | “area” | “lanczos”. Default: “bilinear”.
backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by
mmcv.use_backend()
will be used. Default: None.output_keys (list[str] | None) – The resized images. Default: None Note that if it is not None, its length should be equal to keys.
- _resize(img)[source]¶
Resize function.
- Parameters
img (np.ndarray) – Image.
- Returns
Resized image.
- Return type
img (np.ndarray)
- class mmagic.datasets.transforms.CenterCropLongEdge(keys='img')[source]¶
Bases:
mmcv.transforms.BaseTransform
Center crop the given image by the long edge.
- Parameters
keys (list[str]) – The images to be cropped.
- class mmagic.datasets.transforms.Crop(keys, crop_size, random_crop=True, is_pad_zeros=False)[source]¶
Bases:
mmcv.transforms.BaseTransform
Crop data to specific size for training.
- Parameters
keys (Sequence[str]) – The images to be cropped.
crop_size (Tuple[int]) – Target spatial size (h, w).
random_crop (bool) – If set to True, it will random crop image. Otherwise, it will work as center crop. Default: True.
is_pad_zeros (bool, optional) – Whether to pad the image with 0 if crop_size is greater than image size. Default: False.
- _crop(data)[source]¶
Crop the data.
- Parameters
data (Union[List, np.ndarray]) – Input data to crop.
- Returns
cropped data and corresponding crop box.
- Return type
tuple
- class mmagic.datasets.transforms.CropAroundCenter(crop_size)[source]¶
Bases:
mmcv.transforms.BaseTransform
Randomly crop the images around unknown area in the center 1/4 images.
This cropping strategy is adopted in GCA matting. The unknown area is the same as semi-transparent area. https://arxiv.org/pdf/2001.04069.pdf
It retains the center 1/4 images and resizes the images to ‘crop_size’. Required keys are “fg”, “bg”, “trimap” and “alpha”, added or modified keys are “crop_bbox”, “fg”, “bg”, “trimap” and “alpha”.
- Parameters
crop_size (int | tuple) – Desired output size. If int, square crop is applied.
- class mmagic.datasets.transforms.CropAroundFg(keys, bd_ratio_range=(0.1, 0.4), test_mode=False)[source]¶
Bases:
mmcv.transforms.BaseTransform
Crop around the whole foreground in the segmentation mask.
Required keys are “seg” and the keys in argument keys. Meanwhile, “seg” must be in argument keys. Added or modified keys are “crop_bbox” and the keys in argument keys.
- Parameters
keys (Sequence[str]) – The images to be cropped. It must contain ‘seg’.
bd_ratio_range (tuple, optional) – The range of the boundary (bd) ratio to select from. The boundary ratio is the ratio of the boundary to the minimal bbox that contains the whole foreground given by segmentation. Default to (0.1, 0.4).
test_mode (bool) – Whether use test mode. In test mode, the tight crop area of foreground will be extended to the a square. Default to False.
- class mmagic.datasets.transforms.CropAroundUnknown(keys, crop_sizes, unknown_source='alpha', interpolations='bilinear')[source]¶
Bases:
mmcv.transforms.BaseTransform
Crop around unknown area with a randomly selected scale.
Randomly select the w and h from a list of (w, h). Required keys are the keys in argument keys, added or modified keys are “crop_bbox” and the keys in argument keys. This class assumes value of “alpha” ranges from 0 to 255.
- Parameters
keys (Sequence[str]) – The images to be cropped. It must contain ‘alpha’. If unknown_source is set to ‘trimap’, then it must also contain ‘trimap’.
crop_sizes (list[int | tuple[int]]) – List of (w, h) to be selected.
unknown_source (str, optional) – Unknown area to select from. It must be ‘alpha’ or ‘trimap’. Default to ‘alpha’.
interpolations (str | list[str], optional) – Interpolation method of mmcv.imresize. The interpolation operation will be applied when image size is smaller than the crop_size. If given as a list of str, it should have the same length as keys. Or if given as a str all the keys will be resized with the same method. Default to ‘bilinear’.
- class mmagic.datasets.transforms.CropLike(target_key, reference_key=None)[source]¶
Bases:
mmcv.transforms.BaseTransform
Crop/pad the image in the target_key according to the size of image in the reference_key .
- Parameters
target_key (str) – The key needs to be cropped.
reference_key (str | None) – The reference key, need its size. Default: None.
- class mmagic.datasets.transforms.FixedCrop(keys, crop_size, crop_pos=None)[source]¶
Bases:
mmcv.transforms.BaseTransform
Crop paired data (at a specific position) to specific size for training.
- Parameters
keys (Sequence[str]) – The images to be cropped.
crop_size (Tuple[int]) – Target spatial size (h, w).
crop_pos (Tuple[int]) – Specific position (x, y). If set to None, random initialize the position to crop paired data batch. Default: None.
- _crop(data, x_offset, y_offset, crop_w, crop_h)[source]¶
Crop the data.
- Parameters
data (Union[List, np.ndarray]) – Input data to crop.
x_offset (int) – The offset of x axis.
y_offset (int) – The offset of y axis.
crop_w (int) – The width of crop bbox.
crop_h (int) – The height of crop bbox.
- Returns
cropped data and corresponding crop box.
- Return type
tuple
- class mmagic.datasets.transforms.InstanceCrop(config_file, from_pretrained=None, key='img', box_num_upbound=- 1, finesize=256)[source]¶
Bases:
mmcv.transforms.BaseTransform
Use maskrcnn to detect instances on image.
Mask R-CNN is used to detect the instance on the image pred_bbox is used to segment the instance on the image
- Parameters
config_file (str) – config file name relative to detectron2’s “configs/”
key (str) – Unused
box_num_upbound (int) – The upper limit on the number of instances in the figure
- class mmagic.datasets.transforms.ModCrop(key='gt')[source]¶
Bases:
mmcv.transforms.BaseTransform
Mod crop images, used during testing.
Required keys are “scale” and “KEY”, added or modified keys are “KEY”.
- Parameters
key (str) – The key of image. Default: ‘gt’
- class mmagic.datasets.transforms.PairedRandomCrop(gt_patch_size, lq_key='img', gt_key='gt')[source]¶
Bases:
mmcv.transforms.BaseTransform
Paired random crop.
It crops a pair of img and gt images with corresponding locations. It also supports accepting img list and gt list. Required keys are “scale”, “lq_key”, and “gt_key”, added or modified keys are “lq_key” and “gt_key”.
- Parameters
gt_patch_size (int) – cropped gt patch size.
lq_key (str) – Key of LQ img. Default: ‘img’.
gt_key (str) – Key of GT img. Default: ‘gt’.
- class mmagic.datasets.transforms.RandomCropLongEdge(keys='img')[source]¶
Bases:
mmcv.transforms.BaseTransform
Random crop the given image by the long edge.
- Parameters
keys (list[str]) – The images to be cropped.
- class mmagic.datasets.transforms.RandomResizedCrop(keys, crop_size, scale=(0.08, 1.0), ratio=(3.0 / 4.0, 4.0 / 3.0), interpolation='bilinear')[source]¶
Bases:
mmcv.transforms.BaseTransform
Crop data to random size and aspect ratio.
A crop of a random proportion of the original image and a random aspect ratio of the original aspect ratio is made. The cropped image is finally resized to a given size specified by ‘crop_size’. Modified keys are the attributes specified in “keys”.
This code is partially adopted from torchvision.transforms.RandomResizedCrop: [https://pytorch.org/vision/stable/_modules/torchvision/transforms/ transforms.html#RandomResizedCrop].
- Parameters
keys (list[str]) – The images to be resized and random-cropped.
crop_size (int | tuple[int]) – Target spatial size (h, w).
scale (tuple[float], optional) – Range of the proportion of the original image to be cropped. Default: (0.08, 1.0).
ratio (tuple[float], optional) – Range of aspect ratio of the crop. Default: (3. / 4., 4. / 3.).
interpolation (str, optional) – Algorithm used for interpolation. It can be only either one of the following: “nearest” | “bilinear” | “bicubic” | “area” | “lanczos”. Default: “bilinear”.
- get_params(data)[source]¶
Get parameters for a random sized crop.
- Parameters
data (np.ndarray) – Image of type numpy array to be cropped.
- Returns
A tuple containing the coordinates of the top left corner and the chosen crop size.
- class mmagic.datasets.transforms.CompositeFg(fg_dirs, alpha_dirs, interpolation='nearest')[source]¶
Bases:
mmcv.transforms.BaseTransform
Composite foreground with a random foreground.
This class composites the current training sample with additional data randomly (could be from the same dataset). With probability 0.5, the sample will be composited with a random sample from the specified directory. The composition is performed as:
\[ \begin{align}\begin{aligned}fg_{new} = \alpha_1 * fg_1 + (1 - \alpha_1) * fg_2\\\alpha_{new} = 1 - (1 - \alpha_1) * (1 - \alpha_2)\end{aligned}\end{align} \]where \((fg_1, \alpha_1)\) is from the current sample and \((fg_2, \alpha_2)\) is the randomly loaded sample. With the above composition, \(\alpha_{new}\) is still in [0, 1].
Required keys are “alpha” and “fg”. Modified keys are “alpha” and “fg”.
- Parameters
fg_dirs (str | list[str]) – Path of directories to load foreground images from.
alpha_dirs (str | list[str]) – Path of directories to load alpha mattes from.
interpolation (str) – Interpolation method of mmcv.imresize to resize the randomly loaded images. Default: ‘nearest’.
- class mmagic.datasets.transforms.MergeFgAndBg[source]¶
Bases:
mmcv.transforms.BaseTransform
Composite foreground image and background image with alpha.
Required keys are “alpha”, “fg” and “bg”, added key is “merged”.
- class mmagic.datasets.transforms.PerturbBg(gamma_ratio=0.6)[source]¶
Bases:
mmcv.transforms.BaseTransform
Randomly add gaussian noise or gamma change to background image.
Required key is “bg”, added key is “noisy_bg”.
- Parameters
gamma_ratio (float, optional) – The probability to use gamma correction instead of gaussian noise. Defaults to 0.6.
- class mmagic.datasets.transforms.RandomJitter(hue_range=40)[source]¶
Bases:
mmcv.transforms.BaseTransform
Randomly jitter the foreground in hsv space.
The jitter range of hue is adjustable while the jitter ranges of saturation and value are adaptive to the images. Side effect: the “fg” image will be converted to np.float32. Required keys are “fg” and “alpha”, modified key is “fg”.
- Parameters
hue_range (float | tuple[float]) – Range of hue jittering. If it is a float instead of a tuple like (min, max), the range of hue jittering will be (-hue_range, +hue_range). Default: 40.
- class mmagic.datasets.transforms.RandomLoadResizeBg(bg_dir, flag='color', channel_order='bgr')[source]¶
Bases:
mmcv.transforms.BaseTransform
Randomly load a background image and resize it.
Required key is “fg”, added key is “bg”.
- Parameters
bg_dir (str) – Path of directory to load background images from.
flag (str) – Loading flag for images. Default: ‘color’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
kwargs (dict) – Args for file client.
- class mmagic.datasets.transforms.PackInputs(keys: Tuple[List[str], str] = ['merged', 'img'], meta_keys: Tuple[List[str], str] = [], data_keys: Tuple[List[str], str] = [])[source]¶
Bases:
mmcv.transforms.base.BaseTransform
Pack data into DataSample for training, evaluation and testing.
- MMagic follows the design of data structure from MMEngine.
Data from the loader will be packed into data field of DataSample. More details of DataSample refer to the documentation of MMEngine: https://mmengine.readthedocs.io/en/latest/advanced_tutorials/data_element.html
- Parameters
Tuple[List[str] (meta_keys) – The keys to saved in returned inputs, which are used as the input of models, default to [‘img’, ‘noise’, ‘merged’].
str – The keys to saved in returned inputs, which are used as the input of models, default to [‘img’, ‘noise’, ‘merged’].
None] – The keys to saved in returned inputs, which are used as the input of models, default to [‘img’, ‘noise’, ‘merged’].
Tuple[List[str] – The keys to saved in data_field of the data_samples.
str – The keys to saved in data_field of the data_samples.
None] – The keys to saved in data_field of the data_samples.
Tuple[List[str] – The meta keys to saved in metainfo of the data_samples. All the other data will be packed into the data of the data_samples
str – The meta keys to saved in metainfo of the data_samples. All the other data will be packed into the data of the data_samples
None] – The meta keys to saved in metainfo of the data_samples. All the other data will be packed into the data of the data_samples
- transform(results: dict) dict [source]¶
Method to pack the input data.
- Parameters
results (dict) – Result dict from the data pipeline.
- Returns
A dict contains
’inputs’ (obj:dict): The forward data of models. According to different tasks, the inputs may contain images, videos, labels, text, etc.
- ’data_samples’ (obj:DataSample): The annotation info of the
sample.
- Return type
dict
- class mmagic.datasets.transforms.GenerateCoordinateAndCell(sample_quantity=None, scale=None, target_size=None, reshape_gt=True)[source]¶
Bases:
mmcv.transforms.base.BaseTransform
Generate coordinate and cell. Generate coordinate from the desired size of SR image.
Train or val:
Generate coordinate from GT.
#. Reshape GT image to (HgWg, 3) and transpose to (3, HgWg). where Hg and Wg represent the height and width of GT.
Test:
Generate coordinate from LQ and scale or target_size.
Then generate cell from coordinate.
- Parameters
sample_quantity (int | None) – The quantity of samples in coordinates. To ensure that the GT tensors in a batch have the same dimensions. Default: None.
scale (float) – Scale of upsampling. Default: None.
target_size (tuple[int]) – Size of target image. Default: None.
reshape_gt (bool) – Whether reshape gt to (-1, 3). Default: True If sample_quantity is not None, reshape_gt = True.
The priority of getting ‘size of target image’ is:
results[‘gt’].shape[-2:]
results[‘lq’].shape[-2:] * scale
target_size
- transform(results)[source]¶
Call function.
- Parameters
results (Require either in) – A dict containing the necessary information
augmentation. (and data for) –
results –
'lq' (1.) –
'gt' (2.) –
None (3.) –
and (the premise is self.target_size) –
len (self.target_size) –
- Returns
A dict containing the processed data and information. Reshape ‘gt’ to (-1, 3) and transpose to (3, -1) if ‘gt’ in results. Add ‘coord’ and ‘cell’.
- Return type
dict
- class mmagic.datasets.transforms.GenerateFacialHeatmap(image_key, ori_size, target_size, sigma=1.0, use_cache=True)[source]¶
Bases:
mmcv.transforms.base.BaseTransform
Generate heatmap from keypoint.
- Parameters
image_key (str) – Key of facial image in dict.
ori_size (int | Tuple[int]) – Original image size of keypoint.
target_size (int | Tuple[int]) – Target size of heatmap.
sigma (float) – Sigma parameter of heatmap. Default: 1.0
use_cache (bool) – If True, load all heatmap at once. Default: True.
- transform(results)[source]¶
transform function.
- Parameters
results (dict) – A dict containing the necessary information and data for augmentation. Require keypoint.
- Returns
- A dict containing the processed data and information.
Add ‘heatmap’.
- Return type
dict
- generate_heatmap_from_img(image)[source]¶
Generate heatmap from img.
- Parameters
image (np.ndarray) – Face image.
- results:
heatmap (np.ndarray): Heatmap the face image.
- _face_alignment_detector(image)[source]¶
Generate face landmark by face_alignment.
- Parameters
image (np.ndarray) – Face image.
- Returns
Location of landmark.
- Return type
landmark (Tuple[float])
- class mmagic.datasets.transforms.GenerateFrameIndices(interval_list, frames_per_clip=99)[source]¶
Bases:
mmcv.transforms.BaseTransform
Generate frame index for REDS datasets. It also performs temporal augmentation with random interval.
Required Keys:
img_path
gt_path
key
num_input_frames
Modified Keys:
img_path
gt_path
Added Keys:
interval
reverse
- Parameters
interval_list (list[int]) – Interval list for temporal augmentation. It will randomly pick an interval from interval_list and sample frame index with the interval.
frames_per_clip (int) – Number of frames per clips. Default: 99 for REDS dataset.
- class mmagic.datasets.transforms.GenerateFrameIndiceswithPadding(padding, filename_tmpl='{:08d}')[source]¶
Bases:
mmcv.transforms.BaseTransform
Generate frame index with padding for REDS dataset and Vid4 dataset during testing.
Required Keys:
img_path
gt_path
key
num_input_frames
sequence_length
Modified Keys:
img_path
gt_path
- Parameters
padding –
padding mode, one of ‘replicate’ | ‘reflection’ | ‘reflection_circle’ | ‘circle’.
Examples: current_idx = 0, num_input_frames = 5 The generated frame indices under different padding mode:
replicate: [0, 0, 0, 1, 2] reflection: [2, 1, 0, 1, 2] reflection_circle: [4, 3, 0, 1, 2] circle: [3, 4, 0, 1, 2]
- class mmagic.datasets.transforms.GenerateSegmentIndices(interval_list, start_idx=0, filename_tmpl='{:08d}.png')[source]¶
Bases:
mmcv.transforms.BaseTransform
Generate frame indices for a segment. It also performs temporal augmentation with random interval.
Required Keys:
img_path
gt_path
key
num_input_frames
sequence_length
Modified Keys:
img_path
gt_path
Added Keys:
interval
reverse
- Parameters
interval_list (list[int]) – Interval list for temporal augmentation. It will randomly pick an interval from interval_list and sample frame index with the interval.
start_idx (int) – The index corresponds to the first frame in the sequence. Default: 0.
filename_tmpl (str) – Template for file name. Default: ‘{:08d}.png’.
- class mmagic.datasets.transforms.GetMaskedImage(img_key='gt', mask_key='mask', out_key='img', zero_value=127.5)[source]¶
Bases:
mmcv.transforms.base.BaseTransform
Get masked image.
- Parameters
img_key (str) – Key for clean image. Default: ‘gt’.
mask_key (str) – Key for mask image. The mask shape should be (h, w, 1) while ‘1’ indicate holes and ‘0’ indicate valid regions. Default: ‘mask’.
img_key – Key for output image. Default: ‘img’.
zero_value (float) – Pixel value of masked area.
- class mmagic.datasets.transforms.GetSpatialDiscountMask(gamma=0.99, beta=1.5)[source]¶
Bases:
mmcv.transforms.BaseTransform
Get spatial discounting mask constant.
Spatial discounting mask is first introduced in: Generative Image Inpainting with Contextual Attention.
- Parameters
gamma (float, optional) – Gamma for computing spatial discounting. Defaults to 0.99.
beta (float, optional) – Beta for computing spatial discounting. Defaults to 1.5.
- spatial_discount_mask(mask_width, mask_height)[source]¶
Generate spatial discounting mask constant.
- Parameters
mask_width (int) – The width of bbox hole.
mask_height (int) – The height of bbox height.
- Returns
Spatial discounting mask.
- Return type
np.ndarray
- class mmagic.datasets.transforms.LoadImageFromFile(key: str, color_type: str = 'color', channel_order: str = 'bgr', imdecode_backend: Optional[str] = None, use_cache: bool = False, to_float32: bool = False, to_y_channel: bool = False, save_original_img: bool = False, backend_args: Optional[dict] = None)[source]¶
Bases:
mmcv.transforms.BaseTransform
Load a single image or image frames from corresponding paths. Required Keys: - [Key]_path
New Keys: - [KEY] - ori_[KEY]_shape - ori_[KEY]
- Parameters
key (str) – Keys in results to find corresponding path.
color_type (str) – The flag argument for :func:
mmcv.imfrombytes
. Defaults to ‘color’.channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
imdecode_backend (str) – The image decoding backend type. The backend argument for :func:
mmcv.imfrombytes
. See :func:mmcv.imfrombytes
for details. candidates are ‘cv2’, ‘turbojpeg’, ‘pillow’, and ‘tifffile’. Defaults to None.use_cache (bool) – If True, load all images at once. Default: False.
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
to_y_channel (bool) – Whether to convert the loaded image to y channel. Only support ‘rgb2ycbcr’ and ‘rgb2ycbcr’ Defaults to False.
backend_args (dict, optional) – Arguments to instantiate the prefix of uri corresponding backend. Defaults to None.
- transform(results: dict) dict [source]¶
Functions to load image or frames.
- Parameters
results (dict) – Result dict from :obj:
mmcv.BaseDataset
.- Returns
The dict contains loaded image and meta information.
- Return type
dict
- _load_image(filename)[source]¶
Load an image from file.
- Parameters
filename (str) – Path of image file.
- Returns
Image.
- Return type
np.ndarray
- class mmagic.datasets.transforms.LoadMask(mask_mode='bbox', mask_config=None)[source]¶
Bases:
mmcv.transforms.BaseTransform
Load Mask for multiple types.
For different types of mask, users need to provide the corresponding config dict.
Example config for bbox:
config = dict(img_shape=(256, 256), max_bbox_shape=128)
Example config for irregular:
config = dict( img_shape=(256, 256), num_vertices=(4, 12), max_angle=4., length_range=(10, 100), brush_width=(10, 40), area_ratio_range=(0.15, 0.5))
Example config for ff:
config = dict( img_shape=(256, 256), num_vertices=(4, 12), mean_angle=1.2, angle_range=0.4, brush_width=(12, 40))
Example config for set:
config = dict( mask_list_file='xxx/xxx/ooxx.txt', prefix='/xxx/xxx/ooxx/', io_backend='local', color_type='unchanged', file_client_kwargs=dict() ) The mask_list_file contains the list of mask file name like this: test1.jpeg test2.jpeg ... ... The prefix gives the data path.
- Parameters
mask_mode (str) – Mask mode in [‘bbox’, ‘irregular’, ‘ff’, ‘set’, ‘file’]. Default: ‘bbox’. * bbox: square bounding box masks. * irregular: irregular holes. * ff: free-form holes from DeepFillv2. * set: randomly get a mask from a mask set. * file: get mask from ‘mask_path’ in results.
mask_config (dict) – Params for creating masks. Each type of mask needs different configs. Default: None.
- class mmagic.datasets.transforms.LoadPairedImageFromFile(key: str, domain_a: str = 'A', domain_b: str = 'B', color_type: str = 'color', channel_order: str = 'bgr', imdecode_backend: Optional[str] = None, use_cache: bool = False, to_float32: bool = False, to_y_channel: bool = False, save_original_img: bool = False, backend_args: Optional[dict] = None)[source]¶
Bases:
LoadImageFromFile
Load a pair of images from file.
Each sample contains a pair of images, which are concatenated in the w dimension (a|b). This is a special loading class for generation paired dataset. It loads a pair of images as the common loader does and crops it into two images with the same shape in different domains.
Required key is “pair_path”. Added or modified keys are “pair”, “pair_ori_shape”, “ori_pair”, “img_{domain_a}”, “img_{domain_b}”, “img_{domain_a}_path”, “img_{domain_b}_path”, “img_{domain_a}_ori_shape”, “img_{domain_b}_ori_shape”, “ori_img_{domain_a}” and “ori_img_{domain_b}”.
- Parameters
key (str) – Keys in results to find corresponding path.
domain_a (str, Optional) – One of the paired image domain. Defaults to ‘A’.
domain_b (str, Optional) – The other of the paired image domain. Defaults to ‘B’.
color_type (str) – The flag argument for :func:
mmcv.imfrombytes
. Defaults to ‘color’.channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
imdecode_backend (str) – The image decoding backend type. The backend argument for :func:
mmcv.imfrombytes
. See :func:mmcv.imfrombytes
for details. candidates are ‘cv2’, ‘turbojpeg’, ‘pillow’, and ‘tifffile’. Defaults to None.use_cache (bool) – If True, load all images at once. Default: False.
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
to_y_channel (bool) – Whether to convert the loaded image to y channel. Only support ‘rgb2ycbcr’ and ‘rgb2ycbcr’ Defaults to False.
backend_args (dict, optional) – Arguments to instantiate the prefix of uri corresponding backend. Defaults to None.
io_backend (str, optional) – io backend where images are store. Defaults to None.
- class mmagic.datasets.transforms.MATLABLikeResize(keys, scale=None, output_shape=None, kernel='bicubic', kernel_width=4.0)[source]¶
Bases:
mmcv.transforms.BaseTransform
Resize the input image using MATLAB-like downsampling.
Currently support bicubic interpolation only. Note that the output of this function is slightly different from the official MATLAB function.
Required keys are the keys in attribute “keys”. Added or modified keys are “scale” and “output_shape”, and the keys in attribute “keys”.
- Parameters
keys (list[str]) – A list of keys whose values are modified.
scale (float | None, optional) – The scale factor of the resize operation. If None, it will be determined by output_shape. Default: None.
output_shape (tuple(int) | None, optional) – The size of the output image. If None, it will be determined by scale. Note that if scale is provided, output_shape will not be used. Default: None.
kernel (str, optional) – The kernel for the resize operation. Currently support ‘bicubic’ only. Default: ‘bicubic’.
kernel_width (float) – The kernel width. Currently support 4.0 only. Default: 4.0.
- _resize(img)[source]¶
resize an image to the require size.
- Parameters
img (np.ndarray) – The original image.
- Returns
The resized image.
- Return type
output (np.ndarray)
- class mmagic.datasets.transforms.Normalize(keys, mean, std, to_rgb=False, save_original=False)[source]¶
Bases:
mmcv.transforms.BaseTransform
Normalize images with the given mean and std value.
Required keys are the keys in attribute “keys”, added or modified keys are the keys in attribute “keys” and these keys with postfix ‘_norm_cfg’. It also supports normalizing a list of images.
- Parameters
keys (Sequence[str]) – The images to be normalized.
mean (np.ndarray) – Mean values of different channels.
std (np.ndarray) – Std values of different channels.
to_rgb (bool) – Whether to convert channels from BGR to RGB. Default: False.
save_original (bool) – Whether to save original images. Default: False.
- class mmagic.datasets.transforms.RescaleToZeroOne(keys)[source]¶
Bases:
mmcv.transforms.BaseTransform
Transform the images into a range between 0 and 1.
Required keys are the keys in attribute “keys”, added or modified keys are the keys in attribute “keys”. It also supports rescaling a list of images.
- Parameters
keys (Sequence[str]) – The images to be transformed.
- class mmagic.datasets.transforms.DegradationsWithShuffle(degradations, keys, shuffle_idx=None)[source]¶
Apply random degradations to input, with degradations being shuffled.
Degradation groups are supported. The order of degradations within the same group is preserved. For example, if we have degradations = [a, b, [c, d]] and shuffle_idx = None, then the possible orders are
[a, b, [c, d]] [a, [c, d], b] [b, a, [c, d]] [b, [c, d], a] [[c, d], a, b] [[c, d], b, a]
Modified keys are the attributed specified in “keys”.
- Parameters
degradations (list[dict]) – The list of degradations.
keys (list[str]) – A list specifying the keys whose values are modified.
shuffle_idx (list | None, optional) – The degradations corresponding to these indices are shuffled. If None, all degradations are shuffled. Default: None.
- class mmagic.datasets.transforms.RandomBlur(params, keys)[source]¶
Apply random blur to the input.
Modified keys are the attributed specified in “keys”.
- Parameters
params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.
- get_kernel(num_kernels: int)[source]¶
This is the function to create kernel.
- Parameters
num_kernels (int) – the number of kernels
- Returns
_description_
- Return type
_type_
- class mmagic.datasets.transforms.RandomJPEGCompression(params, keys, color_type='color', bgr2rgb=False)[source]¶
Apply random JPEG compression to the input.
Modified keys are the attributed specified in “keys”.
- Parameters
params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.
bgr2rgb (str) – Whether change channel order. Default: False.
- class mmagic.datasets.transforms.RandomNoise(params, keys)[source]¶
Apply random noise to the input.
Currently support Gaussian noise and Poisson noise.
Modified keys are the attributed specified in “keys”.
- Parameters
params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.
- _apply_gaussian_noise(imgs)[source]¶
This is the function used to apply gaussian noise on images.
- Parameters
imgs (Tensor) – images
- Returns
images applied gaussian noise
- Return type
Tensor
- class mmagic.datasets.transforms.RandomResize(params, keys)[source]¶
Randomly resize the input.
Modified keys are the attributed specified in “keys”.
- Parameters
params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.
- class mmagic.datasets.transforms.RandomVideoCompression(params, keys)[source]¶
Apply random video compression to the input.
Modified keys are the attributed specified in “keys”.
- Parameters
params (dict) – A dictionary specifying the degradation settings.
keys (list[str]) – A list specifying the keys whose values are modified.
- class mmagic.datasets.transforms.RandomDownSampling(scale_min=1.0, scale_max=4.0, patch_size=None, interpolation='bicubic', backend='pillow')[source]¶
Bases:
mmcv.transforms.BaseTransform
Generate LQ image from GT (and crop), which will randomly pick a scale.
- Parameters
scale_min (float) – The minimum of upsampling scale, inclusive. Default: 1.0.
scale_max (float) – The maximum of upsampling scale, exclusive. Default: 4.0.
patch_size (int) – The cropped lr patch size. Default: None, means no crop.
interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear”, “bicubic”, “box”, “lanczos”, “hamming” for ‘pillow’ backend. Default: “bicubic”.
backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by
mmcv.use_backend()
will be used. Default: “pillow”.[scale_min (Scale will be picked in the range of) –
scale_max). –
- class mmagic.datasets.transforms.FormatTrimap(to_onehot=False)[source]¶
Bases:
mmcv.transforms.BaseTransform
Convert trimap (tensor) to one-hot representation.
It transforms the trimap label from (0, 128, 255) to (0, 1, 2). If
to_onehot
is set to True, the trimap will convert to one-hot tensor of shape (3, H, W). Required key is “trimap”, added or modified key are “trimap” and “format_trimap_to_onehot”.- Parameters
to_onehot (bool) – whether convert trimap to one-hot tensor. Default:
False
.
- class mmagic.datasets.transforms.GenerateTrimap(kernel_size, iterations=1, random=True)[source]¶
Bases:
mmcv.transforms.BaseTransform
Using random erode/dilate to generate trimap from alpha matte.
Required key is “alpha”, added key is “trimap”.
- Parameters
kernel_size (int | tuple[int]) – The range of random kernel_size of erode/dilate; int indicates a fixed kernel_size. If random is set to False and kernel_size is a tuple of length 2, then it will be interpreted as (erode kernel_size, dilate kernel_size). It should be noted that the kernel of the erosion and dilation has the same height and width.
iterations (int | tuple[int], optional) – The range of random iterations of erode/dilate; int indicates a fixed iterations. If random is set to False and iterations is a tuple of length 2, then it will be interpreted as (erode iterations, dilate iterations). Default to 1.
random (bool, optional) – Whether use random kernel_size and iterations when generating trimap. See kernel_size and iterations for more information. Default to True.
- class mmagic.datasets.transforms.GenerateTrimapWithDistTransform(dist_thr=20, random=True)[source]¶
Bases:
mmcv.transforms.BaseTransform
Generate trimap with distance transform function.
- Parameters
dist_thr (int, optional) – Distance threshold. Area with alpha value between (0, 255) will be considered as initial unknown area. Then area with distance to unknown area smaller than the distance threshold will also be consider as unknown area. Defaults to 20.
random (bool, optional) – If True, use random distance threshold from [1, dist_thr). If False, use dist_thr as the distance threshold directly. Defaults to True.
- class mmagic.datasets.transforms.TransformTrimap[source]¶
Bases:
mmcv.transforms.BaseTransform
Transform trimap into two-channel and six-channel.
This class will generate a two-channel trimap composed of definite foreground and background masks and encode it into a six-channel trimap using Gaussian blurs of the generated two-channel trimap at three different scales. The transformed trimap has 6 channels.
Required key is “trimap”, added key is “transformed_trimap” and “two_channel_trimap”.
Adopted from the following repository: https://github.com/MarcoForte/FBA_Matting/blob/master/networks/transforms.py.
- class mmagic.datasets.transforms.CopyValues(src_keys, dst_keys)[source]¶
Bases:
mmcv.transforms.BaseTransform
Copy the value of source keys to destination keys.
# TODO Change to dict(dst=src)
It does the following: results[dst_key] = results[src_key] for (src_key, dst_key) in zip(src_keys, dst_keys).
Added keys are the keys in the attribute “dst_keys”.
Required Keys:
[SRC_KEYS]
Added Keys:
[DST_KEYS]
- Parameters
src_keys (list[str]) – The source keys.
dst_keys (list[str]) – The destination keys.
- class mmagic.datasets.transforms.SetValues(dictionary)[source]¶
Bases:
mmcv.transforms.BaseTransform
Set value to destination keys.
It does the following: results[key] = value
Added keys are the keys in the dictionary.
Required Keys:
None
Added or Modified Keys:
keys in the dictionary
- Parameters
dictionary (dict) – The dictionary to update.