mindspore.dataset.config.set_enable_autotune

mindspore.dataset.config.set_enable_autotune(enable, filepath_prefix=None)

Set whether to enable AutoTune. AutoTune is disabled by default.

AutoTune is used to automatically adjust the global configuration of the data pipeline according to the workload of environmental resources during the training process to improve the speed of data processing.

The optimized global configuration can be saved as a JSON file by setting json_filepath for subsequent reuse.

Parameters

enable (bool) – Whether to enable AutoTune.
filepath_prefix (str, optional) – The prefix filepath to save the optimized global configuration. The rank id and the json extension will be appended to the filepath_prefix string in multi-device training, rank id will be set to 0 in standalone training. For example, if filepath_prefix=”/path/to/some/dir/prefixname” and rank_id is 1, then the path of the generated file will be “/path/to/some/dir/prefixname_1.json” If the file already exists, it will be automatically overwritten. Default: None, means not to save the configuration file, but the tuned result still can be checked through INFO log.

Raises

TypeError – If enable is not of type boolean.
TypeError – If json_filepath is not of type str.
RuntimeError – If json_filepath is an empty string.
RuntimeError – If json_filepath is a directory.
RuntimeError – If json_filepath does not exist.
RuntimeError – If json_filepath does not have write permission.

Note

When enable is False, json_filepath will be ignored.
The JSON file can be loaded by API mindspore.dataset.deserialize to build a tuned pipeline.
In distributed training scenario, set_enable_autotune() must be called after cluster communication has been initialized (mindspore.communication.management.init()), otherwise the AutoTune file will always suffix with rank id 0.

An example of the generated JSON file is as follows. “remark” file will conclude that if the dataset has been tuned or not. “summary” filed will show the tuned configuration of dataset pipeline. Users can modify scripts based on the tuned result.

{
    "remark": "The following file has been auto-generated by the Dataset AutoTune.",
    "summary": [
        "CifarOp(ID:5)       (num_parallel_workers: 2, prefetch_size:64)",
        "MapOp(ID:4)         (num_parallel_workers: 2, prefetch_size:64)",
        "MapOp(ID:3)         (num_parallel_workers: 2, prefetch_size:64)",
        "BatchOp(ID:2)       (num_parallel_workers: 8, prefetch_size:64)"
    ],
    "tree": {
        ...
    }
}

Examples

>>> # enable AutoTune and save optimized data pipeline configuration
>>> ds.config.set_enable_autotune(True, "/path/to/autotune_out.json")
>>>
>>> # enable AutoTune
>>> ds.config.set_enable_autotune(True)