{ "cells": [ { "cell_type": "markdown", "source": [ "# Data Iteration\n", "\n", "`Ascend` `GPU` `CPU` `Data Preparation`\n", "\n", "[![Download Notebook](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_notebook_en.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.6/programming_guide/en/mindspore_dataset_usage.ipynb) [![View Source On Gitee](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_source_en.png)](https://gitee.com/mindspore/docs/blob/r1.6/docs/mindspore/programming_guide/source_en/dataset_usage.ipynb)" ], "metadata": {} }, { "cell_type": "markdown", "source": [ "Translator: [Ming__blue](https://gitee.com/ming__blue)\n", "\n", "## Overview\n", "\n", "Original dataset is read into the memory through dataset loading interface, and then data is transformed through data enhancement operation. The obtained dataset object has two conventional data iteration methods:\n", "\n", "- Create an iterator for data iteration.\n", "\n", "- Pass in the model interface (such as `model.train`, `model.eval`, etc.) for iterative training or inference.\n", "\n", "## Create an iterator for data iteration\n", "\n", "Dataset objects can usually create two different iterators to traverse the data, namely tuple iterator and dictionary iterator.\n", "\n", "The interface for creating tuple iterator is `create_tuple_iterator`, and the interface for creating dictionary iterator is `create_dict_iterator`. The specific usage is as follows.\n", "\n", "First, arbitrarily create a dataset object as a demonstration." ], "metadata": {} }, { "cell_type": "code", "execution_count": 1, "source": [ "import mindspore.dataset as ds\n", "\n", "np_data = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]\n", "dataset = ds.NumpySlicesDataset(np_data, column_names=[\"data\"], shuffle=False)" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "Following methods can be used to create a data iterator." ], "metadata": {} }, { "cell_type": "code", "execution_count": 2, "source": [ "# Create tuple iterator\n", "print(\"\\n create tuple iterator\")\n", "for item in dataset.create_tuple_iterator():\n", " print(\"item:\\n\", item[0])\n", "\n", "# Create dictionary iterator\n", "print(\"\\n create dict iterator\")\n", "for item in dataset.create_dict_iterator():\n", " print(\"item:\\n\", item[\"data\"])\n", "\n", "# Traverse the dataset object directly (equivalent to creating tuple iterator)\n", "print(\"\\n iterate dataset object directly\")\n", "for item in dataset:\n", " print(\"item:\\n\", item[0])\n", "\n", "# Traverse the dataset object using enumerate method(equivalent to creating tuple iterator)\n", "print(\"\\n iterate dataset using enumerate\")\n", "for index, item in enumerate(dataset):\n", " print(\"index: {}, item:\\n {}\".format(index, item[0]))" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", " create tuple iterator\n", "item:\n", " [[1 2]\n", " [3 4]]\n", "item:\n", " [[5 6]\n", " [7 8]]\n", "\n", " create dict iterator\n", "item:\n", " [[1 2]\n", " [3 4]]\n", "item:\n", " [[5 6]\n", " [7 8]]\n", "\n", " iterate dataset object directly\n", "item:\n", " [[1 2]\n", " [3 4]]\n", "item:\n", " [[5 6]\n", " [7 8]]\n", "\n", " iterate dataset using enumerate\n", "index: 0, item:\n", " [[1 2]\n", " [3 4]]\n", "index: 1, item:\n", " [[5 6]\n", " [7 8]]\n" ] } ], "metadata": { "ExecuteTime": { "end_time": "2021-09-13T09:01:55.937317Z", "start_time": "2021-09-13T09:01:53.924910Z" } } }, { "cell_type": "markdown", "source": [ "In addition, to generate data in multiple Epochs, adjust the value of the input parameter `num_epochs` accordingly. Compared with calling the iterator interface multiple times, directly setting the Epoch number can improve the performance of data iteration." ], "metadata": {} }, { "cell_type": "code", "execution_count": 3, "source": [ "# Create tuple iterator to generate data in two Epochs\n", "epoch = 2\n", "iterator = dataset.create_tuple_iterator(num_epochs=epoch)\n", "for i in range(epoch):\n", " print(\"epoch: \", i)\n", " for item in iterator:\n", " print(\"item:\\n\", item[0])" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "epoch: 0\n", "item:\n", " [[1 2]\n", " [3 4]]\n", "item:\n", " [[5 6]\n", " [7 8]]\n", "epoch: 1\n", "item:\n", " [[1 2]\n", " [3 4]]\n", "item:\n", " [[5 6]\n", " [7 8]]\n" ] } ], "metadata": { "ExecuteTime": { "end_time": "2021-09-13T09:01:55.951495Z", "start_time": "2021-09-13T09:01:55.938705Z" } } }, { "cell_type": "markdown", "source": [ "The default output data type of the iterator is `mindspore.Tensor`. To get data of the type `numpy.ndarray`, set the parameter `output_numpy=True`." ], "metadata": {} }, { "cell_type": "code", "execution_count": 4, "source": [ "# The default output type is mindspore.Tensor\n", "for item in dataset.create_tuple_iterator():\n", " print(\"dtype: \", type(item[0]), \"\\nitem:\", item[0])\n", "\n", "# Set the output type to numpy.ndarray\n", "for item in dataset.create_tuple_iterator(output_numpy=True):\n", " print(\"dtype: \", type(item[0]), \"\\nitem:\", item[0])" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "dtype: \n", "item: [[1 2]\n", " [3 4]]\n", "dtype: \n", "item: [[5 6]\n", " [7 8]]\n", "dtype: \n", "item: [[1 2]\n", " [3 4]]\n", "dtype: \n", "item: [[5 6]\n", " [7 8]]\n" ] } ], "metadata": {} }, { "cell_type": "markdown", "source": [ "For more detailed instructions, please refer to [create_tuple_iterator](https://www.mindspore.cn/docs/api/en/r1.6/api_python/dataset/mindspore.dataset.NumpySlicesDataset.html#mindspore.dataset.NumpySlicesDataset.create_tuple_iterator ) and [create_dict_iterator](https://www.mindspore.cn/docs/api/en/r1.6/api_python/dataset/mindspore.dataset.NumpySlicesDataset.html#mindspore.dataset.NumpySlicesDataset.create_dict_iterator) API documentation." ], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Pass in the Model interface for iterative training or inference\n", "\n", "After the dataset object is created, it can be passed into the `Model` interface, iterate data inside the interface, and send it to the network for training or inference." ], "metadata": {} }, { "cell_type": "code", "execution_count": 5, "source": [ "import numpy as np\n", "from mindspore import ms_function\n", "from mindspore import context, nn, Model\n", "import mindspore.dataset as ds\n", "import mindspore.ops as ops\n", "\n", "\n", "def create_dataset():\n", " np_data = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]\n", " np_data = np.array(np_data, dtype=np.float16)\n", " dataset = ds.NumpySlicesDataset(np_data, column_names=[\"data\"], shuffle=False)\n", " return dataset\n", "\n", "\n", "class Net(nn.Cell):\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " self.relu = ops.ReLU()\n", " self.print = ops.Print()\n", "\n", " @ms_function\n", " def construct(self, x):\n", " self.print(x)\n", " return self.relu(x)\n", "\n", "\n", "if __name__ == \"__main__\":\n", " # it is supported to run in CPU, GPU or Ascend\n", " context.set_context(mode=context.GRAPH_MODE)\n", " dataset = create_dataset()\n", " network = Net()\n", " model = Model(network)\n", "\n", " # do training, sink to device defaultly\n", " model.train(epoch=1, train_dataset=dataset, dataset_sink_mode=True)" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Tensor(shape=[2, 2], dtype=Float16, value=\n", "[[ 1.0000e+00 2.0000e+00]\n", " [ 3.0000e+00 4.0000e+00]])\n", "Tensor(shape=[2, 2], dtype=Float16, value=\n", "[[ 5.0000e+00 6.0000e+00]\n", " [ 7.0000e+00 8.0000e+00]])\n" ] } ], "metadata": { "ExecuteTime": { "end_time": "2021-09-13T09:01:56.002002Z", "start_time": "2021-09-13T09:01:55.953018Z" } } }, { "cell_type": "markdown", "source": [ "The `dataset_sink_mode` parameter in the Model interface is used to set whether to sink data to the Device. If it is set to not sink, the above iterator will be created internally to traverse the data one by one and sent to the network; if set to sink, the data will be sent directly to the Device internally and sent to the network for iterative training or inference.\n", "\n", "For more detailed usage, please refer to [Model Basic Usage](https://www.mindspore.cn/docs/programming_guide/en/r1.6/index.html)." ], "metadata": {} } ], "metadata": { "kernelspec": { "name": "python3", "display_name": "Python 3.8.10 64-bit" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "interpreter": { "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" } }, "nbformat": 4, "nbformat_minor": 4 }