{ "cells": [ { "cell_type": "markdown", "id": "89f4092a", "metadata": {}, "source": [ "# Optimizing the Data Processing\n", "\n", "[![Download Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_notebook_en.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/r2.0.0-alpha/tutorials/experts/en/dataset/mindspore_optimize.ipynb) [![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_source_en.png)](https://gitee.com/mindspore/docs/blob/r2.0.0-alpha/tutorials/experts/source_en/dataset/optimize.ipynb)" ] }, { "cell_type": "markdown", "id": "16c89fb0", "metadata": {}, "source": [ "Data is the most important part of the whole deep learning, because the quality of the data determines the upper limit of the final result, and the quality of the model is only to infinitely approach this upper limit, so high-quality data input will play a positive role in the entire deep neural network. The data in the entire process of data processing and data augmentation is like water through the pipeline, continuous flows to the training system, as shown in the figure:" ] }, { "cell_type": "markdown", "id": "891fd520", "metadata": {}, "source": [ "![pipeline](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/tutorials/experts/source_en/dataset/images/pipeline.png)" ] }, { "cell_type": "markdown", "id": "99c4ddf0", "metadata": {}, "source": [ "MindSpore provides data processing and data augmentation functions for users. In the pipeline process, if each step can be properly used, the data performance will be greatly improved.\n", "\n", "This section describes how to optimize performance during data loading, data processing, and data augmentation based on the CIFAR-10 dataset.\n", "\n", "In addition, the storage, architecture and computing resources of the operating system will influence the performance of data processing to a certain extent.\n", "\n", "## Downloading the Dataset\n", "\n", "Run the following command to obtain the dataset.\n", "\n", "Download the CIFAR-10 binary format dataset and extract the dataset file to the `./datasets/` directory, which is used when the data is loaded." ] }, { "cell_type": "code", "execution_count": 1, "id": "69690b3f", "metadata": {}, "outputs": [], "source": [ "from download import download\n", "import os\n", "import shutil\n", "\n", "url = \"https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-binary.tar.gz\"\n", "download(url, \"./datasets\", kind=\"tar.gz\") # Download CIFAR-10 dataset\n", "\n", "test_path = \"./datasets/cifar-10-batches-bin/test\"\n", "train_path = \"./datasets/cifar-10-batches-bin/train\"\n", "os.makedirs(test_path, exist_ok=True)\n", "os.makedirs(train_path, exist_ok=True)\n", "if not os.path.exists(os.path.join(test_path, \"test_batch.bin\")):\n", " shutil.move(\"./datasets/cifar-10-batches-bin/test_batch.bin\", test_path)\n", "[shutil.move(\"./datasets/cifar-10-batches-bin/\"+i, train_path) for i in os.listdir(\"./datasets/cifar-10-batches-bin/\") if os.path.isfile(\"./datasets/cifar-10-batches-bin/\"+i) and not i.endswith(\".html\") and not os.path.exists(os.path.join(train_path, i))]" ] }, { "cell_type": "markdown", "id": "f02d8f5d", "metadata": {}, "source": [ "The directory structure of the decompressed dataset file is as follows:\n", "\n", "```text\n", "./datasets/cifar-10-batches-bin\n", "├── readme.html\n", "├── test\n", "│ └── test_batch.bin\n", "└── train\n", " ├── batches.meta.txt\n", " ├── data_batch_1.bin\n", " ├── data_batch_2.bin\n", " ├── data_batch_3.bin\n", " ├── data_batch_4.bin\n", " └── data_batch_5.bin\n", "```" ] }, { "cell_type": "markdown", "id": "b39ecc87", "metadata": {}, "source": [ "## Optimizing the Data Loading Performance\n", "\n", "MindSpore supports loading common datasets in fields such as computer vision, natural language processing, datasets in specific formats, and user-defined datasets. The underlying implementation of different dataset loading interfaces is different, and the performance is also different, as follows:\n", "\n", "| | Common Dataset | User-defined Dataset | MindRecord Dataset |\n", "| :----: | :----: | :----: | :----: |\n", "| Underlying implementation | C++ | Python | C++ |\n", "| Performance | High | Medium | High |\n", "\n", "Performance Optimization Solution" ] }, { "cell_type": "markdown", "id": "e6fe8b35", "metadata": {}, "source": [ "![data-loading-performance-scheme](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/tutorials/experts/source_en/dataset/images/data_loading_performance_scheme.png)" ] }, { "cell_type": "markdown", "id": "4f510408", "metadata": {}, "source": [ "Suggestions on data loading performance optimization are as follows:\n", "\n", "- For commonly used datasets that have already provided loading interfaces, it is preferential to use the dataset loading interface provided by MindSpore to load, which can obtain better loading performance. For details, see [Built-in Loading Operations](https://www.mindspore.cn/docs/en/r2.0.0-alpha/api_python/mindspore.dataset.html), if the performance cannot meet the requirements, use the multi-thread concurrency solution. For details, see [Multi-thread Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/optimize.html#multi-thread-optimization-solution).\n", "- For a dataset format that is not supported, it is recommended to convert the dataset to the MindRecord data format before loading it using the `MindDataset` class (Please refer to the [API](https://www.mindspore.cn/docs/en/r2.0.0-alpha/api_python/dataset/mindspore.dataset.MindDataset.html) for detailed use). For detailed contents, please refer to [Converting Dataset to MindRecord](https://www.mindspore.cn/tutorials/en/r2.0.0-alpha/advanced/dataset/record.html). If the performance cannot meet the requirements, use the multi-thread concurrency solution, for details, see [Multi-thread Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/optimize.html#multi-thread-optimization-solution).\n", "- For dataset formats that are not supported, the user-defined `GeneratorDataset` class is preferred for implementing fast algorithm verification (Please refer to the [API](https://www.mindspore.cn/docs/en/r2.0.0-alpha/api_python/dataset/mindspore.dataset.GeneratorDataset.html) for detailed use). If the performance cannot meet the requirements, the multi-process concurrency solution can be used. For details, see [Multi-process Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/optimize.html#multi-process-optimization-solution).\n", "\n", "Based on the preceding suggestions of data loading performance optimization, this experience uses the built-in load operation `Cifar10Dataset` class (Please refer to the [API](https://www.mindspore.cn/docs/en/r2.0.0-alpha/api_python/dataset/mindspore.dataset.Cifar10Dataset.html) for detailed use), the `MindDataset` class after data conversion, and uses the `GeneratorDataset` class to load data. The sample code is displayed as follows:\n", "\n", "1. Use the `Cifar10Dataset` class of built-in operations to load the CIFAR-10 dataset in binary format. The multi-thread optimization solution is used for data loading. Four threads are enabled to concurrently complete the task. Finally, a dictionary iterator is created for the data and a data record is read through the iterator." ] }, { "cell_type": "code", "execution_count": 5, "id": "51ded706", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'image': Tensor(shape=[32, 32, 3], dtype=UInt8, value=\n", "[[[209, 206, 192],\n", " [211, 209, 201],\n", " [221, 217, 213],\n", " ...\n", " [172, 175, 194],\n", " [169, 173, 190],\n", " [115, 121, 145]],\n", " [[226, 230, 211],\n", " [227, 229, 218],\n", " [230, 232, 221],\n", " ...\n", " [153, 153, 171],\n", " [156, 156, 173],\n", " [106, 111, 129]],\n", " [[214, 226, 203],\n", " [214, 222, 204],\n", " [217, 227, 206],\n", " ...\n", " [167, 166, 176],\n", " [147, 147, 156],\n", " [ 78, 84, 96]],\n", " ...\n", " [[ 40, 69, 61],\n", " [ 37, 63, 57],\n", " [ 43, 68, 66],\n", " ...\n", " [ 55, 70, 69],\n", " [ 40, 54, 51],\n", " [ 27, 44, 36]],\n", " [[ 33, 61, 50],\n", " [ 37, 65, 56],\n", " [ 54, 72, 74],\n", " ...\n", " [ 47, 60, 56],\n", " [ 58, 66, 64],\n", " [ 36, 50, 46]],\n", " [[ 29, 41, 37],\n", " [ 38, 60, 59],\n", " [ 51, 76, 81],\n", " ...\n", " [ 32, 51, 43],\n", " [ 47, 61, 54],\n", " [ 56, 67, 66]]]), 'label': Tensor(shape=[], dtype=UInt32, value= 5)}\n" ] } ], "source": [ "import mindspore.dataset as ds\n", "cifar10_path = \"./datasets/cifar-10-batches-bin/train\"\n", "\n", "# create Cifar10Dataset for reading data\n", "cifar10_dataset = ds.Cifar10Dataset(cifar10_path, num_parallel_workers=4)\n", "# create a dictionary iterator and read a data record through the iterator\n", "print(next(cifar10_dataset.create_dict_iterator()))" ] }, { "cell_type": "markdown", "id": "b2b9a9f3", "metadata": {}, "source": [ "2. Use the `Cifar10ToMR` class to convert the CIFAR-10 dataset into the MindSpore data format. In this example, the CIFAR-10 dataset in Python file format is used. Then use the `MindDataset` class to load the dataset in the MindSpore data format. The multi-thread optimization solution is used for data loading. Four threads are enabled to concurrently complete the task. Finally, a dictionary iterator is created for data and a data record is read through the iterator." ] }, { "cell_type": "code", "execution_count": 6, "id": "443fc50d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'data': Tensor(shape=[1283], dtype=UInt8, value= [255, 216, 255, 224, 0, 16, 74, 70, 73, 70, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 255, 219, 0, 67, \n", " 107, 249, 17, 58, 213, 185, 117, 181, 143, 255, 217]), 'id': Tensor(shape=[], dtype=Int64, value= 32476), 'label': Tensor(shape=[], dtype=Int64, value= 9)}\n" ] } ], "source": [ "from mindspore.mindrecord import Cifar10ToMR\n", "\n", "trans_path = \"./transform/\"\n", "\n", "if not os.path.exists(trans_path):\n", " os.mkdir(trans_path)\n", "\n", "os.system(\"rm -f {}cifar10*\".format(trans_path))\n", "\n", "# download CIFAR-10 python\n", "py_url = \"https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/cifar-10-python.tar.gz\"\n", "download(py_url, \"./datasets\", kind=\"tar.gz\", replace=True)\n", "\n", "cifar10_path = './datasets/cifar-10-batches-py'\n", "cifar10_mindrecord_path = './transform/cifar10.record'\n", "\n", "cifar10_transformer = Cifar10ToMR(cifar10_path, cifar10_mindrecord_path)\n", "# execute transformation from CIFAR-10 to MindRecord\n", "cifar10_transformer.transform(['label'])\n", "\n", "# create MindDataset for reading data\n", "cifar10_mind_dataset = ds.MindDataset(dataset_files=cifar10_mindrecord_path, num_parallel_workers=4)\n", "# create a dictionary iterator and read a data record through the iterator\n", "print(next(cifar10_mind_dataset.create_dict_iterator()))" ] }, { "cell_type": "markdown", "id": "b1c38c06", "metadata": {}, "source": [ "3. The `GeneratorDataset` class is used to load the user-defined dataset, and the multi-process optimization solution is used. Four processes are enabled to concurrently complete the task. Finally, a dictionary iterator is created for the data, and a data record is read through the iterator." ] }, { "cell_type": "code", "execution_count": 7, "id": "0a5ddaeb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'data': Tensor(shape=[1], dtype=Int64, value= [0])}\n" ] } ], "source": [ "import numpy as np\n", "def generator_func(num):\n", " for i in range(num):\n", " yield (np.array([i]),)\n", "\n", "# create a GeneratorDataset object for reading data\n", "dataset = ds.GeneratorDataset(source=generator_func(5), column_names=[\"data\"], num_parallel_workers=4)\n", "# create a dictionary iterator and read a data record through the iterator\n", "print(next(dataset.create_dict_iterator()))" ] }, { "cell_type": "markdown", "id": "77115132", "metadata": {}, "source": [ "## Optimizing the Shuffle Performance\n", "\n", "The shuffle operation is used to shuffle ordered datasets or repeated datasets. MindSpore provides the `shuffle` function for users. A larger value of `buffer_size` indicates a higher shuffling degree, consuming more time and computing resources. This API allows users to shuffle the data at any time during the entire pipeline process. For the detailed contents, refer to [shuffle processing](https://mindspore.cn/tutorials/zh-CN/r2.0.0-alpha/beginner/dataset.html#shuffle). Because the underlying implementation methods are different, the performance of this method is not as good as that of setting the `shuffle` parameter to directly shuffle data by referring to the [Built-in Loading Operations](https://www.mindspore.cn/docs/en/r2.0.0-alpha/api_python/mindspore.dataset.html).\n", "\n", "Performance Optimization Solution" ] }, { "cell_type": "markdown", "id": "d5aee4ec", "metadata": {}, "source": [ "![shuffle-performance-scheme](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/tutorials/experts/source_en/dataset/images/shuffle_performance_scheme.png)" ] }, { "cell_type": "markdown", "id": "ce526b87", "metadata": {}, "source": [ "Suggestions on shuffle performance optimization are as follows:\n", "\n", "- Use the `shuffle` parameter of built-in loading operations to shuffle data.\n", "- If the `shuffle` function is used and the performance still cannot meet the requirements, adjust the value of the `buffer_size` parameter to improve the performance.\n", "\n", "Based on the preceding shuffle performance optimization suggestions, the `shuffle` parameter of the `Cifar10Dataset` class of built-in loading operations and the `Shuffle` function are used to shuffle data. The sample code is displayed as follows:\n", "\n", "1. Use the `Cifar10Dataset` class of built-in operations to load the CIFAR-10 dataset. In this example, the CIFAR-10 dataset in binary format is used, and the `shuffle` parameter is set to True to perform data shuffle. Finally, a dictionary iterator is created for the data and a data record is read through the iterator." ] }, { "cell_type": "code", "execution_count": 8, "id": "43f14228", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'image': Tensor(shape=[32, 32, 3], dtype=UInt8, value=\n", "[[[119, 193, 196],\n", " [121, 192, 204],\n", " [123, 193, 209],\n", " ...\n", " [110, 168, 177],\n", " [109, 167, 176],\n", " [110, 168, 178]],\n", " [[110, 188, 199],\n", " [109, 185, 202],\n", " [111, 186, 204],\n", " ...\n", " [107, 173, 179],\n", " [107, 173, 179],\n", " [109, 175, 182]],\n", " [[110, 186, 200],\n", " [108, 183, 199],\n", " [110, 184, 199],\n", " ...\n", " [115, 183, 189],\n", " [117, 185, 190],\n", " [117, 185, 191]],\n", " ...\n", " [[210, 253, 250],\n", " [212, 251, 250],\n", " [214, 250, 249],\n", " ...\n", " [194, 247, 247],\n", " [190, 246, 245],\n", " [184, 245, 244]],\n", " [[215, 253, 251],\n", " [218, 252, 250],\n", " [220, 251, 249],\n", " ...\n", " [200, 248, 248],\n", " [195, 247, 245],\n", " [189, 245, 244]],\n", " [[216, 253, 253],\n", " [222, 251, 250],\n", " [225, 250, 249],\n", " ...\n", " [204, 249, 248],\n", " [200, 246, 244],\n", " [196, 245, 244]]]), 'label': Tensor(shape=[], dtype=UInt32, value= 0)}\n" ] } ], "source": [ "cifar10_path = \"./datasets/cifar-10-batches-bin/train\"\n", "\n", "# create Cifar10Dataset for reading data\n", "cifar10_dataset = ds.Cifar10Dataset(cifar10_path, shuffle=True)\n", "# create a dictionary iterator and read a data record through the iterator\n", "print(next(cifar10_dataset.create_dict_iterator()))" ] }, { "cell_type": "markdown", "id": "e8628944", "metadata": {}, "source": [ "2. Use the `shuffle` function to shuffle data. Set `buffer_size` to 3 and use the `GeneratorDataset` class to generate data." ] }, { "cell_type": "code", "execution_count": 9, "id": "1a429588", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "before shuffle:\n", "[0 1 2 3 4]\n", "[1 2 3 4 5]\n", "[2 3 4 5 6]\n", "[3 4 5 6 7]\n", "[4 5 6 7 8]\n", "after shuffle:\n", "[2 3 4 5 6]\n", "[0 1 2 3 4]\n", "[1 2 3 4 5]\n", "[4 5 6 7 8]\n", "[3 4 5 6 7]\n" ] } ], "source": [ "def generator_func():\n", " for i in range(5):\n", " yield (np.array([i, i+1, i+2, i+3, i+4]),)\n", "\n", "ds1 = ds.GeneratorDataset(source=generator_func, column_names=[\"data\"])\n", "print(\"before shuffle:\")\n", "for data in ds1.create_dict_iterator():\n", " print(data[\"data\"])\n", "\n", "ds2 = ds1.shuffle(buffer_size=3)\n", "print(\"after shuffle:\")\n", "for data in ds2.create_dict_iterator():\n", " print(data[\"data\"])" ] }, { "cell_type": "markdown", "id": "529e0536", "metadata": {}, "source": [ "## Optimizing the Data Augmentation Performance\n", "\n", "During image classification training, especially when the dataset is small, users can use data augmentation to preprocess images to enrich the dataset. MindSpore provides multiple data augmentation methods, including:\n", "\n", "- Use data augmentation operations implemented in C++ (mainly based on OpenCV).\n", "- Use data augmentation operations implemented in Python (mainly based on Pillow).\n", "- Users can define Python functions as needed to perform data augmentation.\n", "\n", "Please refer to [Data Augmentation](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/augment.html). The performance varies according to the underlying implementation methods. This is shown below:\n", "\n", "| Programming Language | Third Party Library | Description |\n", "| :----: | :----: | :----: |\n", "| C++ | OpenCV | Implemented in C++ code which has higher performance |\n", "| Python | Pillow | Implemented in Python code which is more flexible |\n", "\n", "Performance Optimization Solution" ] }, { "cell_type": "markdown", "id": "2729405a", "metadata": {}, "source": [ "![data-enhancement-performance-scheme](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/tutorials/experts/source_en/dataset/images/data_enhancement_performance_scheme.png)" ] }, { "cell_type": "markdown", "id": "82f4b809", "metadata": {}, "source": [ "Suggestions on data augmentation performance optimization are as follows:\n", "\n", "- The C++ implemented operations are preferentially used to perform data augmentation for its higher performance. If the performance cannot meet the requirements, refer to [Multi-thread Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/optimize.html#multi-thread-optimization-solution), [Compose Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/optimize.html#compose-optimization-solution), or [Operation Fusion Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/optimize.html#operation-fusion-optimization-solution).\n", "- If the Python implemented operations are used to perform data augmentation and the performance still cannot meet the requirements, refer to [Multi-thread Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/optimize.html#multi-thread-optimization-solution), [Multi-process Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/optimize.html#multi-process-optimization-solution), [Compose Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/optimize.html#compose-optimization-solution), or [Operation Fusion Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/optimize.html#operation-fusion-optimization-solution).\n", "- If the user-defined Python functions are used to perform data augmentation and the performance still cannot meet the requirements, use the [Multi-thread Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/optimize.html#multi-thread-optimization-solution) or [Multi-process Optimization Solution](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/optimize.html#multi-process-optimization-solution). If the performance still cannot be improved, in this case, optimize the user-defined Python code.\n", "\n", "MindSpore also supports users to use the data augmentation operations implemented in C++ and Python at the same time, but due to the different underlying implementations of the two, excessive mixing will increase resource overhead and reduce processing performance. It is recommended that users should use the operations implemented in the same language, or use one of them first, then use the other. Please do not switch frequently between the data augmentation operations of two different implementation languages.\n", "\n", "Based on the preceding suggestions of data augmentation performance optimization, the C++ implemented operations and user-defined Python functions are used to perform data augmentation. The code is displayed as follows:\n", "\n", "1. The C++ implemented operations are used to perform data augmentation. During data augmentation, the multi-thread optimization solution is used. Four threads are enabled to concurrently complete the task. The operation fusion optimization solution is used and the `RandomResizedCrop` fusion class is used to replace the `RandomResize` and `RandomCrop` classes." ] }, { "cell_type": "code", "execution_count": 10, "id": "9b8f6b16", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import mindspore.dataset.vision as vision\n", "import matplotlib.pyplot as plt\n", "\n", "cifar10_path = \"./datasets/cifar-10-batches-bin/train\"\n", "\n", "# create Cifar10Dataset for reading data\n", "cifar10_dataset = ds.Cifar10Dataset(cifar10_path, num_parallel_workers=4)\n", "transforms = vision.RandomResizedCrop((800, 800))\n", "# apply the transform to the dataset through dataset.map()\n", "cifar10_dataset = cifar10_dataset.map(operations=transforms, input_columns=\"image\", num_parallel_workers=4)\n", "\n", "data = next(cifar10_dataset.create_dict_iterator())\n", "plt.imshow(data[\"image\"].asnumpy())\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "5dae7eee", "metadata": {}, "source": [ "2. A user-defined Python function is used to perform data augmentation. During data augmentation, the multi-process optimization solution is used, and four processes are enabled to concurrently complete the task." ] }, { "cell_type": "code", "execution_count": 11, "id": "b6235a6e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "before map:\n", "[0 1 2 3 4]\n", "[1 2 3 4 5]\n", "[2 3 4 5 6]\n", "[3 4 5 6 7]\n", "[4 5 6 7 8]\n", "after map:\n", "[ 0 1 4 9 16]\n", "[ 1 4 9 16 25]\n", "[ 4 9 16 25 36]\n", "[ 9 16 25 36 49]\n", "[16 25 36 49 64]\n" ] } ], "source": [ "def generator_func():\n", " for i in range(5):\n", " yield (np.array([i, i+1, i+2, i+3, i+4]),)\n", "\n", "ds3 = ds.GeneratorDataset(source=generator_func, column_names=[\"data\"])\n", "print(\"before map:\")\n", "for data in ds3.create_dict_iterator():\n", " print(data[\"data\"])\n", "\n", "func = lambda x: x**2\n", "ds4 = ds3.map(operations=func, input_columns=\"data\", python_multiprocessing=True, num_parallel_workers=4)\n", "print(\"after map:\")\n", "for data in ds4.create_dict_iterator():\n", " print(data[\"data\"])" ] }, { "cell_type": "markdown", "id": "9d4b2d66", "metadata": {}, "source": [ "## Optimizing the Operating System Performance\n", "\n", "Data processing is performed on the Host. Therefore, configurations of the running environment may affect the processing performance. Major factors include storage, NUMA architecture, and CPU (computing resources).\n", "\n", "1. Storage\n", "\n", "The data loading process involves frequent disk operations, and the performance of disk reading and writing directly affects the speed of data loading. Solid State Drive (SSD) is recommended for storing large datasets when the dataset is large. SSDs generally have higher read and write speeds than ordinary disks, reducing the impact of I/O operations on data processing performance.\n", "\n", "In general, the loaded data will be cached into the operating system's page cache, which reduces the overhead of subsequent reads to a certain extent and accelerates the data loading speed of subsequent Epochs. Users can also manually cache the augmented data through the single-node caching technology provided by MindSpore, avoiding duplicate data loading and data augmentation.\n", "\n", "2. NUMA architecture\n", "\n", "NUMA, Non-Uniform Memory Access, is a memory architecture that was born to solve the scalability problem in the traditional symmetric multiprocessor (SMP) architecture. In traditional architectures, multiple processors share a memory bus, which is prone to problems such as insufficient bandwidth and memory conflicts.\n", "\n", "In the NUMA architecture, processors and memory are divided into groups, each called a node, each node has a separate integrated memory controller (IMC) bus for intra-node communication, and different nodes communicate with each other through a fast path interconnect (QPI). For a node, memory within the same node is called local memory, while memory in other nodes is called external memory. The delay in accessing local memory will be less than the delay in accessing external memory.\n", "\n", "During data processing, you can reduce the latency of memory access by binding the process to the node. In general, we can use the following command to bind the process to the node node:\n", "\n", "```bash\n", "numactl --cpubind=0 --membind=0 python train.py\n", "```" ] }, { "cell_type": "markdown", "id": "9ad3fdb3", "metadata": {}, "source": [ "3. CPU (computing resource)\n", "\n", "Although the data processing speed can be accelerated through multi-threaded parallel technology, there is actually no guarantee that CPU computing resources will be fully utilized. If you can artificially complete the configuration of computing resources in advance, it will be able to improve the utilization of CPU computing resources to a certain extent.\n", "\n", "- Resource allocation\n", "\n", "In distributed training, multiple training processes are run on one device. These training processes allocate and compete for computing resources based on the policy of the operating system. When there is a large number of processes, data processing performance may deteriorate due to resource contention. In some cases, users need to manually allocate resources to avoid resource contention.\n", "\n", "```bash\n", "numactl --cpubind=0 python train.py\n", "```\n", "\n", "- CPU frequency\n", "\n", "For energy efficiency reasons, the operating system adjusts the CPU operating frequency as needed, but lower power consumption means that computing performance is degraded and data processing is slowed down. In order to get the most out of the CPU's maximum computing power, you need to manually set the CPU's operating frequency. If it is found that the CPU operation mode of the operating system is balanced mode or energy-saving mode, you can improve the performance of data processing by adjusting it to performance mode.\n", "\n", "```bash\n", "cpupower frequency-set -g performance\n", "```" ] }, { "cell_type": "markdown", "id": "9002ddbf", "metadata": {}, "source": [ "## Dataset AutoTune for Dataset Pipeline\n", "\n", "MindSpore provides a tool named Dataset AutoTune for optimizing dataset.The Dataset AutoTune can automatically tune Dataset pipelines to improve performance. The detailed usage please refer to [Dataset AutoTune for Dataset Pipeline](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/dataset_autotune.html)." ] }, { "cell_type": "markdown", "id": "73b6e142", "metadata": {}, "source": [ "## Enabling Heterogeneous Acceleration for Data\n", "\n", "MindSpore provides a computing load balancing technology which can distribute the MindSpore Tensor computing to different heterogeneous hardware. On one hand, it balances the computing overhead between different hardware, on the other, it uses the advantages of heterogeneous hardware to accelerate the computing. The detailed usage please refer to [Enabling Heterogeneous Acceleration for Data](https://www.mindspore.cn/tutorials/experts/en/r2.0.0-alpha/dataset/dataset_offload.html)." ] }, { "cell_type": "markdown", "id": "21971bec", "metadata": {}, "source": [ "## Performance Optimization Solution Summary\n", "\n", "### Multi-thread Optimization Solution\n", "\n", "During the data pipeline process, the number of threads for related operations can be set to improve the concurrency and performance. If the user does not manually specify the num_parallel_workers parameter, each data processing operation will use 8 sub-threads for concurrent processing by default. For example:\n", "\n", "- During data loading, the `num_parallel_workers` parameter in the built-in data loading class is used to set the number of threads.\n", "- During data augmentation, the `num_parallel_workers` parameter in the `map` function is used to set the number of threads.\n", "- During batch processing, the `num_parallel_workers` parameter in the `batch` function is used to set the number of threads.\n", "\n", "For details, see [Built-in Loading Operations](https://www.mindspore.cn/docs/en/r2.0.0-alpha/api_python/mindspore.dataset.html). When using MindSpore for standalone or distributed training, the setting of the num_parallel_workers parameter should follow the following principles:\n", "\n", "- The summary of the num_parallel_workers parameter set for each data loading and processing operation should not be greater than the maximum number of CPU cores of the machine, otherwise it will cause resource competition between each operation.\n", "- Before setting the num_parallel_workers parameter, it is recommended to use MindSpore's Profiler (performance analysis) tool to analyze the performance of each operation in the training, and allocate more resources to the operation with pool performance, that is, set a large num_parallel_workers to balance the throughput between various operations and avoid unnecessary waiting.\n", "- In a standalone training scenario, increasing the num_parallel_workers parameter can often directly improve processing performance, but in a distributed scenario, due to increased CPU competition, blindly increasing num_parallel_workers may lead to performance degradation. You need to try to use a compromise value.\n", "\n", "### Multi-process Optimization Solution\n", "\n", "During data processing, operations implemented by Python support the multi-process mode. For example:\n", "\n", "- By default, the `GeneratorDataset` class is in multi-process mode. The `num_parallel_workers` parameter indicates the number of enabled processes. The default value is 1. For details, see [GeneratorDataset](https://www.mindspore.cn/docs/en/r2.0.0-alpha/api_python/dataset/mindspore.dataset.GeneratorDataset.html).\n", "- If the user-defined Python functions or the Python implemennted operations are used to perform data augmentation and the `python_multiprocessing` parameter of the `map` function is set to True, the `num_parallel_workers` parameter indicates the number of processes and the default value of the `python_multiprocessing` parameter is False. In this case, the `num_parallel_workers` parameter indicates the number of threads. For details, see [Built-in Loading Operations](https://www.mindspore.cn/docs/en/r2.0.0-alpha/api_python/mindspore.dataset.html).\n", "\n", "### Compose Optimization Solution\n", "\n", "Map operation can receive the Tensor operation list and apply all these operations based on a specific sequence. Compared with the Map operation used by each Tensor operation, such \"Fat Map operation\" can achieve better performance, as shown in the following figure:" ] }, { "cell_type": "markdown", "id": "2788a75e", "metadata": {}, "source": [ "![compose](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/tutorials/experts/source_en/dataset/images/compose.png)" ] }, { "cell_type": "markdown", "id": "1384ff89", "metadata": {}, "source": [ "### Operation Fusion Optimization Solution\n", "\n", "Some fusion operations are provided to aggregate the functions of two or more operations into one operation. You can configure the environment variable `export OPTIMIZE=true` to make it effective. For details, see [Augmentation Operations](https://www.mindspore.cn/docs/en/r2.0.0-alpha/api_python/mindspore.dataset.transforms.html#module-mindspore.dataset.vision). Compared with the pipelines of their components, such fusion operations provide better performance. As shown in the figure:\n", "\n", "![operation-fusion](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/tutorials/experts/source_en/dataset/images/operation_fusion.png)\n", "\n", "### Operating System Optimization Solution\n", "\n", "- Use Solid State Drives to store the data.\n", "- Bind the process to a NUMA node.\n", "\n", " In the multi card training scenario, each training process can be bound to different NUMA nodes by configuring environment variables `export DATASET_ENABLE_NUMA=True` to ensure more stable data processing of different training processes.\n", "\n", "- Manually allocate more computing resources.\n", "- Set a higher CPU frequency." ] } ], "metadata": { "kernelspec": { "display_name": "MindSpore", "language": "python", "name": "mindspore" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.15" }, "vscode": { "interpreter": { "hash": "e92e3b0e72260407a1e4d16fabe2efc1463db1c235b8d61a4b02ddd7ca8a9a6a" } } }, "nbformat": 4, "nbformat_minor": 5 }