视觉变换样例库

下载Notebook查看源文件

此指南展示了mindpore.dataset.vision模块中各种变换的用法。

环境准备

[1]:
from download import download
import matplotlib.pyplot as plt
from PIL import Image

import mindspore.dataset as ds
import mindspore.dataset.vision as vision

# Download opensource datasets
url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/flamingos.jpg"
download(url, './flamingos.jpg', replace=True)
orig_img = Image.open('flamingos.jpg')

# Env set for randomness and prepare plot function
ds.config.set_seed(66)

def plot(imgs, first_origin=True, **kwargs):
    num_rows = 1
    num_cols = len(imgs) + first_origin

    _, axs = plt.subplots(nrows=num_rows, ncols=num_cols, squeeze=False)
    if first_origin:
        imgs = [orig_img] + imgs
    for idx, img in enumerate(imgs):
        ax = axs[0, idx]
        ax.imshow(img, **kwargs)
        ax.set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])

    if first_origin:
        axs[0, 0].set(title='Original image')
        axs[0, 0].title.set_size(8)
    plt.tight_layout()
Downloading data from https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/flamingos.jpg (45 kB)

file_sizes: 100%|███████████████████████████| 45.8k/45.8k [00:00<00:00, 640kB/s]
Successfully downloaded file to ./flamingos.jpg

几何变换

几何图像变换是指改变图像的几何属性,如其形状、大小、方向或位置,其涉及对图像像素或坐标进行数学运算。

Pad

mindspore.dataset.vision.Pad 会对图像的边缘填充像素。

[2]:
padded_imgs = [vision.Pad(padding=padding)(orig_img) for padding in (3, 10, 30, 50)]
plot(padded_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_4_0.png

Resize

mindspore.dataset.vision.Resize 会调整图像的尺寸大小。

[3]:
resized_imgs = [vision.Resize(size=size)(orig_img) for size in (30, 50, 100)]
plot(resized_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_6_0.png

CenterCrop

mindspore.dataset.vision.CenterCrop 会在图像中裁剪出中心区域。

[4]:
center_crops = [vision.CenterCrop(size=size)(orig_img) for size in (30, 50, 100)]
plot(center_crops)
../../../_images/api_python_samples_dataset_vision_gallery_8_0.png

FiveCrop

mindspore.dataset.vision.FiveCrop 在图像的中心与四个角处分别裁剪指定尺寸大小的子图。

[5]:
(top_left, top_right, bottom_left, bottom_right, center) = vision.FiveCrop(size=(100, 100))(orig_img)
plot([top_left, top_right, bottom_left, bottom_right, center], True)
../../../_images/api_python_samples_dataset_vision_gallery_10_0.png

RandomPerspective

mindspore.dataset.vision.RandomPerspective 会按照指定的概率对输入图像进行透视变换。

[6]:
perspective_transformer = vision.RandomPerspective(distortion_scale=0.6, prob=1.0)
perspective_imgs = [perspective_transformer(orig_img) for _ in range(4)]
plot(perspective_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_12_0.png

RandomRotation

mindspore.dataset.vision.RandomRotation 会随机旋转输入图像。

[7]:
rotater = vision.RandomRotation(degrees=(0, 180))
rotated_imgs = [rotater(orig_img) for _ in range(4)]
plot(rotated_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_14_0.png

RandomAffine

mindspore.dataset.vision.RandomAffine 会对输入图像应用随机仿射变换。

[8]:
affine_transformer = vision.RandomAffine(degrees=(30, 70), translate=(0.1, 0.3), scale=(0.5, 0.75))
affine_imgs = [affine_transformer(orig_img) for _ in range(4)]
plot(affine_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_16_0.png

RandomCrop

mindspore.dataset.vision.RandomCrop 会对输入图像进行随机区域的裁剪。

[9]:
cropper = vision.RandomCrop(size=(128, 128))
crops = [cropper(orig_img) for _ in range(4)]
plot(crops)
../../../_images/api_python_samples_dataset_vision_gallery_18_0.png

RandomResizedCrop

mindspore.dataset.vision.RandomResizedCrop 会对输入图像进行随机裁剪,并将裁剪区域调整为指定的尺寸大小。

[10]:
resize_cropper = vision.RandomResizedCrop(size=(32, 32))
resized_crops = [resize_cropper(orig_img) for _ in range(4)]
plot(resized_crops)
../../../_images/api_python_samples_dataset_vision_gallery_20_0.png

光学变换

光学变换是指修改图像的测光属性,如其亮度、对比度、颜色或色调。这些变换的应用是为了改变图像的视觉外观,但保留其几何结构。

Grayscale

mindspore.dataset.vision.Grayscale 会将图像转换为灰度图。

[11]:
gray_img = vision.Grayscale()(orig_img)
plot([gray_img], cmap='gray')
../../../_images/api_python_samples_dataset_vision_gallery_23_0.png

RandomColorAdjust

mindspore.dataset.vision.RandomColorAdjust 会随机调整输入图像的亮度、对比度、饱和度和色调。

[12]:
jitter = vision.RandomColorAdjust(brightness=.5, hue=.3)
jitted_imgs = [jitter(orig_img) for _ in range(4)]
plot(jitted_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_25_0.png

GaussianBlur

mindspore.dataset.vision.GaussianBlur 会使用指定的高斯核对输入图像进行模糊处理。

[13]:
blurrer = vision.GaussianBlur(kernel_size=(5, 9), sigma=(0.1, 5))
blurred_imgs = [blurrer(orig_img) for _ in range(4)]
plot(blurred_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_27_0.png

RandomInvert

mindspore.dataset.vision.RandomInvert 会以给定的概率随机反转图像的颜色。

[14]:
inverter = vision.RandomInvert()
invertered_imgs = [inverter(orig_img) for _ in range(4)]
plot(invertered_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_29_0.png

RandomPosterize

mindspore.dataset.vision.RandomPosterize 会随机减少图像的颜色通道的比特位数,使图像变得高对比度和颜色鲜艳。

[15]:
posterizer = vision.RandomPosterize(bits=2)
posterized_imgs = [posterizer(orig_img) for _ in range(4)]
plot(posterized_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_31_0.png

RandomSolarize

mindspore.dataset.vision.RandomSolarize 会随机翻转给定范围内的像素。

[16]:
solarizer = vision.RandomSolarize(threshold=(0, 192))
solarized_imgs = [solarizer(orig_img) for _ in range(4)]
plot(solarized_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_33_0.png

RandomAdjustSharpness

mindspore.dataset.vision.RandomAdjustSharpness 会以给定的概率随机调整输入图像的锐度。

[17]:
sharpness_adjuster = vision.RandomAdjustSharpness(degree=2)
sharpened_imgs = [sharpness_adjuster(orig_img) for _ in range(4)]
plot(sharpened_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_35_0.png

RandomAutoContrast

mindspore.dataset.vision.RandomAutoContrast 会以给定的概率自动调整图像的对比度。

[18]:
autocontraster = vision.RandomAutoContrast()
autocontrasted_imgs = [autocontraster(orig_img) for _ in range(4)]
plot(autocontrasted_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_37_0.png

RandomEqualize

mindspore.dataset.vision.RandomEqualize 会以给定的概率随机对输入图像进行直方图均衡化。

[19]:
equalizer = vision.RandomEqualize()
equalized_imgs = [equalizer(orig_img) for _ in range(4)]
plot(equalized_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_39_0.png

增强变换

以下的变换是多个变换的组合,通常来自论文提出的一些高效数据增强方法。

AutoAugment

mindspore.dataset.vision.AutoAugment 会应用AutoAugment数据增强方法,增强的实现基于基于论文AutoAugment: Learning Augmentation Strategies from Data

[20]:
augmenter = vision.AutoAugment(policy=vision.AutoAugmentPolicy.IMAGENET)
imgs = [augmenter(orig_img) for _ in range(4)]
plot(imgs)
../../../_images/api_python_samples_dataset_vision_gallery_42_0.png

RandAugment

mindspore.dataset.vision.RandAugment 会对输入图像应用RandAugment数据增强方法,增强的实现基于基于论文RandAugment: Learning Augmentation Strategies from Data

[21]:
augmenter = vision.RandAugment()
imgs = [augmenter(orig_img) for _ in range(4)]
plot(imgs)
../../../_images/api_python_samples_dataset_vision_gallery_44_0.png

TrivialAugmentWide

mindspore.dataset.vision.TrivialAugmentWide会对输入图像应用TrivialAugmentWide数据增强方法,增强的实现基于基于论文TrivialAugmentWide: Tuning-free Yet State-of-the-Art Data Augmentation

[22]:
augmenter = vision.TrivialAugmentWide()
imgs = [augmenter(orig_img) for _ in range(4)]
plot(imgs)
../../../_images/api_python_samples_dataset_vision_gallery_46_0.png

随机应用的变换

有些变换是以按照给定概率随机应用的。也就是说,转换后的图像可能与原始图像相同。

RandomHorizontalFlip

mindspore.dataset.vision.RandomHorizontalFlip会对输入图像进行水平随机翻转。

[23]:
hflipper = vision.RandomHorizontalFlip(0.5)
transformed_imgs = [hflipper(orig_img) for _ in range(4)]
plot(transformed_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_48_0.png

RandomVerticalFlip

mindspore.dataset.vision.RandomVerticalFlip 会对输入图像进行垂直随机翻转。

[24]:
vflipper = vision.RandomVerticalFlip(0.5)
transformed_imgs = [vflipper(orig_img) for _ in range(4)]
plot(transformed_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_50_0.png

RandomApply

mindspore.dataset.transforms.RandomApply 可以指定一组数据增强处理及其被应用的概率,在运算时按概率随机应用其中的增强处理。

[25]:
import mindspore.dataset.transforms as T

applier = T.RandomApply(transforms=[vision.RandomCrop(size=(64, 64))], prob=0.5)
transformed_imgs = [applier(orig_img) for _ in range(4)]
plot(transformed_imgs)
../../../_images/api_python_samples_dataset_vision_gallery_52_0.png

在数据Pipeline中加载和处理图像文件

使用 mindspore.dataset.ImageFolderDataset 将磁盘中的图像文件内容加载到数据Pipeline中,并进一步应用其他增强操作。

[26]:
from download import download

import os
import mindspore.dataset as ds

# Download a small imagenet as example
url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/imageset.zip"
download(url, "./", kind="zip", replace=True)

# There are 5 classes in the image folder.
os.listdir("./imageset")

# Load these 5 classes into dataset pipeline
dataset = ds.ImageFolderDataset("./imageset", shuffle=False)

# check the column names inside the dataset. "image" column represents the image content and "label" column represents the corresponding label of image.
print("column names:", dataset.get_col_names())

# since the original image is not decoded, apply decode first on "image" column
dataset = dataset.map(vision.Decode(), input_columns=["image"])

# check results
print(">>>>> after decode")
for data, label in dataset:
    print(data.shape, label)

# let's do some transforms on dataset
# apply resize on images
dataset = dataset.map(vision.Resize(size=(48, 48)), input_columns=["image"])

# check results
print(">>>>> after resize")
images = []
for image, label in dataset:
    images.append(image.asnumpy())
    print(image.shape, label)

plot(images[:5], first_origin=False)
Downloading data from https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/imageset.zip (45 kB)

file_sizes: 100%|██████████████████████████| 45.7k/45.7k [00:00<00:00, 1.01MB/s]
Extracting zip file...
Successfully downloaded / unzipped to ./
column names: ['image', 'label']
>>>>> after decode
(64, 64, 3) 0
(64, 64, 3) 0
(64, 64, 3) 0
(64, 64, 3) 1
(64, 64, 3) 1
(64, 64, 3) 1
(64, 64, 3) 1
(64, 64, 3) 2
(64, 64, 3) 2
(64, 64, 3) 2
(64, 64, 3) 3
(64, 64, 3) 3
(64, 64, 3) 3
(64, 64, 3) 4
(64, 64, 3) 4
(64, 64, 3) 4
>>>>> after resize
(48, 48, 3) 0
(48, 48, 3) 0
(48, 48, 3) 0
(48, 48, 3) 1
(48, 48, 3) 1
(48, 48, 3) 1
(48, 48, 3) 1
(48, 48, 3) 2
(48, 48, 3) 2
(48, 48, 3) 2
(48, 48, 3) 3
(48, 48, 3) 3
(48, 48, 3) 3
(48, 48, 3) 4
(48, 48, 3) 4
(48, 48, 3) 4
../../../_images/api_python_samples_dataset_vision_gallery_54_1.png