Environment Variable Descriptions

View Source On Gitee

The following environment variables are supported by MindSpore Transformers.

Debugging Variables

Variables Names

Default

Interpretations

Descriptions

Application Scenarios

HCCL_DETERMINISTIC

false

Whether to enable deterministic computation of reductive communication operators, where reductive communication operators include AllReduce, ReduceScatter, Reduce.

true: turns on the HCCL deterministic switch;
false: turns off the HCCL deterministic switch.

Turning on deterministic computation eliminates the randomness introduced by inconsistent ordering of multi-card computations, but it results in a performance degradation compared to the disabled state. It is recommended to turn it on in scenarios where consistency is required.

LCCL_DETERMINISTIC

0

whether to turn the LCCL deterministic operator AllReduce (order-preserving addition) on.

1: turns on the LCCL deterministic switch;
0: turns off the LCCL deterministic switch.

Turning on deterministic computation eliminates the randomness introduced by inconsistent ordering of multi-card computations, but it results in a performance degradation compared to the disabled state. It is recommended to turn it on in scenarios where consistency is required.
Takes effect when rankSize<=8.

CUSTOM_MATMUL_SHUFFLE

on

Whether to enable shuffle operations for custom matrix multiplication.

on: turns on matrix shuffle;
off: turns off matrix shuffle.

The shuffle operation is optimized for specific matrix sizes and memory access patterns. If the matrix size does not match the shuffle-optimized size, turning off shuffling may result in better performance. Please set it according to the actual usage.

ASCEND_LAUNCH_BLOCKING

0

training or online inference scenarios, this environment variable can be used to control whether synchronization mode is activated during operator execution.

1: synchronized mode is mandatory;
0: synchronized mode is optional.

Since the default operator executes asynchronously during NPU model training, when an error is reported during operator execution, the error stack information printed is not the actual call stack information. When set to 1, synchronized mode is mandatory, which prints the correct call stack information and makes it easier to debug and locate problems in the code. Setting it to 1 provides more efficient arithmetic.

TE_PARALLEL_COMPILER

8

The number of threads on which the operator is compiled in parallel. Enables parallel compilation when greater than 1.

Takes a positive integer;Maximum number of cpu cores*80%/number of Ascend AI processors, value range 1~32, default value is 8.

When the network model is large, parallel compilation of the operator can be turned on by configuring this environment variable;
setting it to 1 for single-threaded compilation simplifies the difficulty when debugging.

CPU_AFFINITY

0

Turn on the CPU affinity switch, thus ensuring that each process or thread is bound to a single CPU core to improve performance.

1: turn on the CPU affinity switch;
0: turn off the CPU affinity switch.

CPU affinity is turned off by default for optimized resource utilization and energy saving.

MS_MEMORY_STATISTIC

0

Memory Statistics.

1: turn on memory statistics;
0: turn off memory statistics.

During memory analysis, basic memory usage can be counted. You can refer to Optimization Guide for details.

MINDSPORE_DUMP_CONFIG

NA

Specify the path to the configuration file that the cloud-side Dump function or end-side Dump function depends on.

File path, support relative path and absolute path.

GLOG_v

3

Controls the level of MindSpore logs.

0: DEBUG
1: INFO
2: WARNING
3: ERROR: indicates that an error has been reported in the execution of the program, an error log is output, and the program may not be terminated;
4: CRITICAL, indicates that an exception has occurred in the execution of the program, and the execution of the program will be terminated.

ASCEND_GLOBAL_LOG_LEVEL

3

Controls the logging level of CANN.

0: DEBUG
1: INFO
2: WARNING
3: ERROR
4: NULL, no log is output.

ASCEND_SLOG_PRINT_TO_STDOUT

0

Whether to display on the screen. When turned on, the logs will not be saved in the log file, but the generated logs will be displayed directly on the screen.

1: Display on the screen
0: Do not display on the screen

ASCEND_GLOBAL_EVENT_ENABLE

0

Whether to enable event logging.

1: turn on Event logging;
0: turn off Event logging.

HCCL_EXEC_TIMEOUT

1836

This environment variable allows you to control the amount of time to wait for synchronization when executing between devices, where each device process waits for the other device to perform communication synchronization for the configured amount of time.

The range is: (0, 17340], and the default value is 1836 in s.

HCCL_CONNECT_TIMEOUT

120

Used in distributed training or inference scenarios to limit the timeout wait time of the socket building process between different devices.

The environment variable needs to be configured as an integer in the range [120,7200], with default value 120s.

MS_NODE_ID

NA

Specifies process rank id in dynamic cluster scenarios.

The rank_id of the process, unique within the cluster.

MS_ALLOC_CONF

NA

Sets memory allocation policies.

Configuration items, formatted as key:value, with multiple items separated by commas. For example: export MS_ALLOC_CONF=enable_vmm:true,memory_tracker:true.
enable_vmm: Whether to enable virtual memory; default value is true.
vmm_align_size: Sets virtual memory alignment size in MB; default value is 2.
memory_tracker: Whether to enable memory tracker; default value is false.
memory_tracker_path: Enables memory tracker and saves to specified path. Default is disabled with empty save path.
simple_tracker: Whether to enable simplified tracker mode, omitting tracker_graph.ir and retaining only the last user task. Takes effect when memory_tracker is enabled. Default is false.
acl_allocator: Whether to use the ACL memory allocator. Default value is true.
somas_whole_block: Whether to use SOMAS whole-block memory allocation. Default value is false.

MS_INTERNAL_DISABLE_CUSTOM_KERNEL_LIST

PagedAttention

Enables a list of custom operators. An experimental configuration item, generally not required. Will be removed in future.

Configured as a string, with operator names separated by commas.

TRANSFORMERS_OFFLINE

0

Forces the Auto interface to read only offline local files.

1, ON, TRUE, YES: Forces reading only offline local files;
Other values: Allows downloading files from the network.

MDS_ENDPOINT

https://modelers.cn

Sets the endpoint for openMind Hub.

Configured as a URL address in string format.

OM_MODULES_CACHE

~/.cache/openmind/modules

Cache path for openMind modules.

Configured as a directory path in string format.

OPENMIND_CACHE

~/.cache/openmind/hub

Cache path for openMind Hub.

Configured as a directory path in string format.

openmind_IS_CI

Indicates whether openMind is operating within a CI access control environment.

1, ON, TRUE, YES: Within CI environment;
All other values: Not within CI environment.

Other Variables

Variables Names

Default

Interpretations

Descriptions

Application Scenarios

RUN_MODE

predict

Set the running mode.

predict: inference
finetune: Fine-tuning
train: Training
eval: Evaluation

USE_ROPE_SELF_DEFINE

true

Whether to enable ROPE fusion operator.

true: enable ROPE fusion operator;
false: disable ROPE fusion operator.

Enabling the ROPE fusion operator by default can improve the computation efficiency. Except for debugging scenarios, turn it off as needed, and generally do not make special settings.

MS_ENABLE_INTERNAL_BOOST

on

Whether to turn on the internal acceleration of the MindSpore framework.

on: turn on MindSpore internal acceleration;
off: turn off MindSpore internal acceleration.

In order to achieve high-performance inference, this parameter is turned on by default. In cases where debugging or comparing different acceleration strategies is performed, this parameter needs to be turned off to observe the impact on performance.

MF_LOG_SUFFIX

NA

Set custom suffixes for all log log folders.

Suffix for the log folder. Default: no suffix

Adding a consistent suffix isolates logs across tasks from being overwritten.

PLOG_REDIRECT_TO_OUTPUT

False

Controls whether plog logs change storage paths.

True: store the logs in the ./output directory;
False: Store to the default storage location.

This setting makes it easier to query the plog log.

MS_ENABLE_FA_FLATTEN

on

Controls whether support FlashAttention flatten optimization.

on: Enable FlashAttention flatten optimization;
off: Disable FlashAttention flatten optimization.

Provide a fallback mechanism for models that have not yet been adapted to FlashAttention flatten optimization.

EXPERIMENTAL_KERNEL_LAUNCH_GROUP

NA

Control whether to support the batch parallel submission of operators. If supported, enable the parallel submission and configure the number of parallel submissions.

thread_num: The number of concurrent threads is not recommended to be increased. The default value is 2;
kernel_group_num: Total number of operator groups, kernel_group_num/thread_num groups per thread, default is 8.

This feature will continue to evolve in the future, and the subsequent behavior may change. Currently, only the deepseek reasoning scenario is supported, with certain performance optimization, but other models using this feature may deteriorate, and users need to use it with caution, as follows:export EXPERIMENTAL_KERNEL_LAUNCH_GROUP="thread_num:2,kernel_group_num:8".

ENFORCE_EAGER

False

Control whether to disable jit mode.

False: Enable jit mode;
True: Do not enable jit mode.

Jit compiles functions into a callable MindSpore graph, sets ENFORCE_EAGER to False to enable jit mode, which can generate performance benefits. Currently, only inference mode is supported.

MS_ENABLE_TFT

NA

Enable the Training Fault Tolerance (TFT) feature, which most functionalities rely on MindIO TFT.

The value of the environment variable can be:"{TTP:1,UCE:1,HCCE:1,ARF:1,TRE:1,TSP:1}", when using a certain feature, the corresponding field can be configured as "1".

Usage can refer to High Availability.

MS_WORKER_NUM

NA

Number of processes assigned the role MS_WORKER.

Integer greater than 0.

Distributed scenarios.

RANK_ID

NA

Specifies the logical ID for invoking the NPU.

0–7. When multiple machines are parallelised, DEVICE_ID may duplicate across different servers. Using RANK_ID avoids this issue (in multi-machine parallelisation, RANK_ID = SERVER_ID * DEVICE_NUM + DEVICE_ID, where DEVICE_ID denotes the Ascend AI processor number on the current machine).

RANK_SIZE

NA

Specifies the number of NPU units to invoke.

An integer greater than 1.

LD_PRELOAD

NA

Specifies the shared library to preload.

Specifies the path to the shared library.

DEVICE_ID

0

Specifies the device ID for invoking the NPU.

0 to the number of NPUs on the server.

MS_SCHED_PORT

NA

Specifies the port number for Scheduler binding.

Port number within the range 1024–65535.

NPU_ASD_ENABLE

0

Whether to enable feature value detection.

0: Disable feature value detection
1: Logs detected anomalies without throwing exceptions
2: Logs anomalies and throws exceptions
3: Logs in both normal and anomalous scenarios (Note: Logging occurs only when CANN is set to INFO or DEBUG levels in normal scenarios). When anomalies are detected, the detection operator throws an exception.

MS_SDC_DETECT_ENABLE

0

Enable/disable CheckSum detection for silent failures.

0: Disable CheckSum detection for silent failures.
1: Enable CheckSum detection for silent failures.

ASCEND_HOME_PATH

NA

Installation path for the Ascend software package.

Set to the specified path.

ENABLE_LAZY_INLINE

1

Whether to enable Lazy Inline mode. This environment variable will be deprecated and removed in the next version.

0: Disable Lazy Inline.
1: Enable Lazy Inline.

LOCAL_DEFAULT_PATH

./output

Sets the default path for logs.

Set to the specified path.

STDOUT_DEVICES

NA

Sets the list of device IDs for standard output.

Set as a numeric list, with multiple IDs separated by commas.

REGISTER_PATH

Directory path containing the plug-in code to be registered.

Set to the specified path.

LOG_MF_PATH

./output/log

Log path for MindSpore Transformers.

Set to the specified path.

DEVICE_NUM_PER_NODE

8

Number of NPUs on the server.

An integer greater than 0.

SHARED_PATHS

Paths for shared storage.

Set to the specified path.

ASCEND_PROCESS_LOG_PATH

NA

Log path for the Ascend process.

Set to the specified path.

ENABLE_LAZY_INLINE_NO_PIPELINE

0

Whether to enable Lazy Inline mode during non-pipelined parallelism. This environment variable will be deprecated and removed in the next version.

0: Lazy Inline disabled.
1: Lazy Inline enabled.

REMOTE_SAVE_URL

None

URL used when saving training results on ModelArts. Currently deprecated and will be removed in future.

Enter the URL for saving results.