# Application of Single-Node Tensor Cache

`Linux` `Ascend` `GPU` `CPU` `Data Preparation` `Intermediate` `Expert`

[![View Source On Gitee](../_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r1.1/tutorials/training/source_en/advanced_use/enable_cache.md)

## Overview

If you need to repeatedly access remote datasets or read datasets from disks, you can use the single-node cache operator to cache datasets in the local memory to accelerate dataset reading.

This tutorial demonstrates how to use the single-node cache service to cache data that has been processed with data augmentation.

## Configuring the Environment

Before using the cache service, you need to install MindSpore and set related environment variables. The Conda environment is used as an example. The setting method is as follows:

```shell
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:{path_to_conda}/envs/{your_env_name}/lib/python3.7/site-packages/mindspore:{path_to_conda}/envs/{your_env_name}/lib/python3.7/site-packages/mindspore/lib
export PATH=$PATH:{path_to_conda}/envs/{your_env_name}/bin
```

## Starting the Cache Server

Before using the single-node cache service, you need to start the cache server.

```shell
$ cache_admin --start
Cache server startup completed successfully!
The cache server daemon has been created as process id 10394 and is listening on port 50052

Recommendation:
Since the server is detached into its own daemon process, monitor the server logs (under /tmp/mindspore/cache/log) for any issues that may happen after startup
```

If the system displays a message indicating that the `libpython3.7m.so.1.0` file cannot be found, search for the file path in the virtual environment and set environment variables.

```shell
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:{path_to_conda}/envs/{your_env_name}/lib
```

## Creating a Cache Session

If no cache session exists on the cache server, a cache session needs to be created to obtain the cache session ID.

```shell
$ cache_admin -g
Session created for server on port 50052: 1493732251
```

The cache session ID is randomly allocated by the server.

## Creating a Cache Instance

Create the Python script `my_training_script.py`, use the `DatasetCache` API to define a cache instance named `some_cache` in the script, and specify the `session_id` parameter to a cache session ID created in the previous step.

```python
import mindspore.dataset as ds

some_cache = ds.DatasetCache(session_id=1493732251, size=0, spilling=True)
```

## Inserting a Cache Instance

The following uses the CIFAR-10 dataset as an example. Before running the sample, download and store the CIFAR-10 dataset by referring to [Loading Dataset](https://www.mindspore.cn/doc/programming_guide/en/r1.1/dataset_loading.html#cifar-10-100-dataset). The directory structure is as follows:

```text
├─my_training_script.py
└─cifar-10-batches-bin
    ├── batches.meta.txt
    ├── data_batch_1.bin
    ├── data_batch_2.bin
    ├── data_batch_3.bin
    ├── data_batch_4.bin
    ├── data_batch_5.bin
    ├── readme.html
    └── test_batch.bin
```

To cache the enhanced data processed by data augmentation of the map operator, the created `some_cache` instance is used as the input parameter of the `cache` API in the map operator.

```python
import mindspore.dataset.vision.c_transforms as c_vision

dataset_dir = "cifar-10-batches-bin/"
data = ds.Cifar10Dataset(dataset_dir=dataset_dir, num_samples=5, shuffle=False, num_parallel_workers=1)

# apply cache to map
rescale_op = c_vision.Rescale(1.0 / 255.0, -1.0)
data = data.map(input_columns=["image"], operations=rescale_op, cache=some_cache)

num_iter = 0
for item in data.create_dict_iterator(num_epochs=1):  # each data is a dictionary
    # in this example, each dictionary has a key "image"
    print("{} image shape: {}".format(num_iter, item["image"].shape))
    num_iter += 1
```

Run the Python script `my_training_script.py`. The following information is displayed:

```text
0 image shape: (32, 32, 3)
1 image shape: (32, 32, 3)
2 image shape: (32, 32, 3)
3 image shape: (32, 32, 3)
4 image shape: (32, 32, 3)
```

You can run the `cache_admin --list_sessions` command to check whether there are five data records in the current session. If yes, the data is successfully cached.

```shell
$ cache_admin --list_sessions
Listing sessions for server on port 50052

     Session    Cache Id  Mem cached  Disk cached  Avg cache size  Numa hit
  1493732251  3618046178       5          n/a          12442         5
```

## Destroying a Cache Session

After the training is complete, you can destroy the current cache and release the memory.

```shell
$ cache_admin --destroy_session 1493732251
Drop session successfully for server on port 50052
```

The preceding command is used to destroy the cache whose session ID is 1493732251.

## Stopping the Cache Server

After using the cache server, you can stop the cache server. This operation will destroy all cache sessions on the current server and release the memory.

```shell
$ cache_admin --stop
Cache server on port 50052 has been stopped successfully.
```