Model-Related | MindSpore Transformers 1.5.0 documentation

Use the command npu-smi info to verify that the card is exclusive.

It is recommended to use the default yaml configuration for the corresponding network when running network.

Increase the value of max_device_memory in the corresponding yaml configuration file of the network. Note that some memory needs to be reserved for inter-card communication, which can be tried with incremental increases.

Adjust the hybrid parallelism strategy, increase pipeline parallelism (pp) and model parallelism (mp) appropriately, and reduce data parallelism (dp) accordingly, keep dp * mp * pp = device_num, and increase the number of NPUs if necessary.

Try to reduce batch size or sequence length.

Turn on selective recalculation or full recalculation, turn on optimizer parallelism.

If the problem still needs further troubleshooting, please feel free to raise issue for feedback.

Model-Related

Q: How to deal with network runtime error “Out of Memory” (OOM)?

Q: How to deal with network runtime error “Out of Memory” (`OOM`)?