mmagic.datasets.transforms.aug_pixel
¶
Module Contents¶
Classes¶
Binarize image. |
|
Clip the pixels. |
|
An interface for torch color jitter so that it can be invoked in mmagic |
|
Apply random affine to input images. |
|
Randomly dilate binary masks. |
|
Apply unsharp masking to an image or a sequence of images. |
- class mmagic.datasets.transforms.aug_pixel.BinarizeImage(keys, binary_thr, a_min=0, a_max=1, dtype=np.uint8)[source]¶
Bases:
mmcv.transforms.BaseTransform
Binarize image.
- Parameters
keys (Sequence[str]) – The images to be binarized.
binary_thr (float) – Threshold for binarization.
a_min (int) – Lower limits of pixel value.
a_max (int) – Upper limits of pixel value.
dtype (np.dtype) – Set the data type of the output. Default: np.uint8
- _binarize(img)[source]¶
Binarize image.
- Parameters
img (np.ndarray) – Input image.
- Returns
Output image.
- Return type
img (np.ndarray)
- class mmagic.datasets.transforms.aug_pixel.Clip(keys, a_min=0, a_max=255)[source]¶
Bases:
mmcv.transforms.BaseTransform
Clip the pixels.
Modified keys are the attributes specified in “keys”.
- Parameters
keys (list[str]) – The keys whose values are clipped.
a_min (int) – Lower limits of pixel value.
a_max (int) – Upper limits of pixel value.
- _clip(input_)[source]¶
Clip the pixels.
- Parameters
input (Union[List, np.ndarray]) – Pixels to clip.
- Returns
Clipped pixels.
- Return type
Union[List, np.ndarray]
- class mmagic.datasets.transforms.aug_pixel.ColorJitter(keys, channel_order='rgb', **kwargs)[source]¶
Bases:
mmcv.transforms.BaseTransform
An interface for torch color jitter so that it can be invoked in mmagic pipeline.
Randomly change the brightness, contrast and saturation of an image. Modified keys are the attributes specified in “keys”.
Required Keys:
[KEYS]
Modified Keys:
[KEYS]
- Parameters
keys (list[str]) – The images to be resized.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘rgb’.
Notes
**kwards
follows the args list oftorchvision.transforms.ColorJitter
.- brightness (float or tuple of float (min, max)): How much to jitter
brightness. brightness_factor is chosen uniformly from [max(0, 1 - brightness), 1 + brightness] or the given [min, max]. Should be non negative numbers.
- contrast (float or tuple of float (min, max)): How much to jitter
contrast. contrast_factor is chosen uniformly from [max(0, 1 - contrast), 1 + contrast] or the given [min, max]. Should be non negative numbers.
- saturation (float or tuple of float (min, max)): How much to jitter
saturation. saturation_factor is chosen uniformly from [max(0, 1 - saturation), 1 + saturation] or the given [min, max]. Should be non negative numbers.
- hue (float or tuple of float (min, max)): How much to jitter hue.
hue_factor is chosen uniformly from [-hue, hue] or the given [min, max]. Should have 0<= hue <= 0.5 or -0.5 <= min <= max <= 0.5.
- _color_jitter(image, this_seed)[source]¶
Color Jitter Function.
- Parameters
image (np.ndarray) – Image.
this_seed (int) – Seed of torch.
- Returns
The output image.
- Return type
image (np.ndarray)
- class mmagic.datasets.transforms.aug_pixel.RandomAffine(keys, degrees, translate=None, scale=None, shear=None, flip_ratio=None)[source]¶
Bases:
mmcv.transforms.BaseTransform
Apply random affine to input images.
This class is adopted from https://github.com/pytorch/vision/blob/v0.5.0/torchvision/transforms/ transforms.py#L1015 It should be noted that in https://github.com/Yaoyi-Li/GCA-Matting/blob/master/dataloader/ data_generator.py#L70 random flip is added. See explanation of flip_ratio below. Required keys are the keys in attribute “keys”, modified keys are keys in attribute “keys”.
- Parameters
keys (Sequence[str]) – The images to be affined.
degrees (float | tuple[float]) – Range of degrees to select from. If it is a float instead of a tuple like (min, max), the range of degrees will be (-degrees, +degrees). Set to 0 to deactivate rotations.
translate (tuple, optional) – Tuple of maximum absolute fraction for horizontal and vertical translations. For example translate=(a, b), then horizontal shift is randomly sampled in the range -img_width * a < dx < img_width * a and vertical shift is randomly sampled in the range -img_height * b < dy < img_height * b. Default: None.
scale (tuple, optional) – Scaling factor interval, e.g (a, b), then scale is randomly sampled from the range a <= scale <= b. Default: None.
shear (float | tuple[float], optional) – Range of shear degrees to select from. If shear is a float, a shear parallel to the x axis and a shear parallel to the y axis in the range (-shear, +shear) will be applied. Else if shear is a tuple of 2 values, a x-axis shear and a y-axis shear in (shear[0], shear[1]) will be applied. Default: None.
flip_ratio (float, optional) – Probability of the image being flipped. The flips in horizontal direction and vertical direction are independent. The image may be flipped in both directions. Default: None.
- static _get_params(degrees, translate, scale_ranges, shears, flip_ratio, img_size)[source]¶
Get parameters for affine transformation.
- Returns
Params to be passed to the affine transformation.
- Return type
paras (tuple)
- static _get_inverse_affine_matrix(center, angle, translate, scale, shear, flip)[source]¶
Helper method to compute inverse matrix for affine transformation.
As it is explained in PIL.Image.rotate, we need compute INVERSE of affine transformation matrix: M = T * C * RSS * C^-1 where T is translation matrix:
[1, 0, tx | 0, 1, ty | 0, 0, 1];
- C is translation matrix to keep center:
[1, 0, cx | 0, 1, cy | 0, 0, 1];
RSS is rotation with scale and shear matrix.
It is different from the original function in torchvision. 1. The order are changed to flip -> scale -> rotation -> shear. 2. x and y have different scale factors. RSS(shear, a, scale, f) =
[ cos(a + shear)*scale_x*f -sin(a + shear)*scale_y 0] [ sin(a)*scale_x*f cos(a)*scale_y 0] [ 0 0 1]
Thus, the inverse is M^-1 = C * RSS^-1 * C^-1 * T^-1.
- class mmagic.datasets.transforms.aug_pixel.RandomMaskDilation(keys, binary_thr=0.0, kernel_min=9, kernel_max=49)[source]¶
Bases:
mmcv.transforms.BaseTransform
Randomly dilate binary masks.
- Parameters
keys (Sequence[str]) – The images to be resized.
binary_thr (float) – Threshold for obtaining binary mask. Default: 0.
kernel_min (int) – Min size of dilation kernel. Default: 9.
kernel_max (int) – Max size of dilation kernel. Default: 49.
- class mmagic.datasets.transforms.aug_pixel.UnsharpMasking(kernel_size, sigma, weight, threshold, keys)[source]¶
Bases:
mmcv.transforms.BaseTransform
Apply unsharp masking to an image or a sequence of images.
- Parameters
kernel_size (int) – The kernel_size of the Gaussian kernel.
sigma (float) – The standard deviation of the Gaussian.
weight (float) – The weight of the “details” in the final output.
threshold (float) – Pixel differences larger than this value are regarded as “details”.
keys (list[str]) – The keys whose values are processed.
Added keys are “xxx_unsharp”, where “xxx” are the attributes specified in “keys”.