mindspore.dataset.DatasetCache
- class mindspore.dataset.DatasetCache(session_id, size=0, spilling=False, hostname=None, port=None, num_connections=None, prefetch_size=None)[source]
- A client to interface with tensor caching service. - For details, please check Tutorial . - Parameters
- session_id (int) – A user assigned session id for the current pipeline. 
- size (int, optional) – Size of the memory set aside for the row caching. Default: - 0, which means unlimited, note that it might bring in the risk of running out of memory on the machine.
- spilling (bool, optional) – Whether or not spilling to disk if out of memory. Default: - False.
- hostname (str, optional) – Host name. Default: - None, use default hostname '127.0.0.1'.
- port (int, optional) – Port to connect to server. Default: - None, use default port 50052.
- num_connections (int, optional) – Number of tcp/ip connections. Default: - None, use default value 12.
- prefetch_size (int, optional) – The size of the cache queue between operations. Default: - None, use default value 20.
 
 - Examples - >>> import subprocess >>> import mindspore.dataset as ds >>> >>> # Create a cache instance with command line `dataset-cache --start` >>> # Create a session with `dataset-cache -g` >>> # After creating cache with a valid session, get session id with command `dataset-cache --list_sessions` >>> command = "dataset-cache --list_sessions | tail -1 | awk -F ' ' '{{print $1;}}'" >>> session_id = subprocess.getoutput(command).split('\n')[-1] >>> some_cache = ds.DatasetCache(session_id=int(session_id), size=0) >>> >>> dataset_dir = "/path/to/image_folder_dataset_directory" >>> dataset = ds.ImageFolderDataset(dataset_dir, cache=some_cache) - get_stat()[source]
- Get the statistics from a cache. After data pipeline, three types of statistics can be obtained, including average number of cache hits (avg_cache_sz), number of caches in memory (num_mem_cached) and number of caches in disk (num_disk_cached). - Examples - >>> import os >>> import subprocess >>> import mindspore.dataset as ds >>> >>> # In example above, we created cache with a valid session id >>> command = "dataset-cache --list_sessions | tail -1 | awk -F ' ' '{{print $1;}}'" >>> id = subprocess.getoutput(command).split('\n')[-1] >>> some_cache = ds.DatasetCache(session_id=int(id), size=0) >>> >>> # run the dataset pipeline to trigger cache >>> dataset = ds.ImageFolderDataset("/path/to/image_folder_dataset_directory", cache=some_cache) >>> data = list(dataset) >>> >>> # get status of cache >>> stat = some_cache.get_stat() >>> # Average cache size >>> cache_sz = stat.avg_cache_sz >>> # Number of rows cached in memory >>> num_mem_cached = stat.num_mem_cached >>> # Number of rows spilled to disk >>> num_disk_cached = stat.num_disk_cached