# Logs [![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_en/feature/logging.md) ## Logs Saving ### Overview MindSpore Transformers will write the model's training configuration, training steps, loss, throughput and other information into the log. Developers can specify the path for log storage. ### Training Log Directory Structure During the training process, MindSpore Transformers will generate a training log directory in the output directory (default is `./output`) by default: `./log`. When the training task is started using the `ms_run` method, an additional log directory will be generated in the output directory by default: `./msrun_log`. | Folder | Description | |-----------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | log | The log information of each card is divided into `rank_{i}` folders. (`i` corresponds to the NPU card number used for training tasks)
Each `rank_{i}` folder will include `info.log` and `error.log` to record the INFO level and ERROR level information output during training respectively. The default maximum size for a single log file is 50 MB, with a maximum of 5 backup logs. | | msrun_log | `worker_{i}.log` is used to record the training log of each card (including error information), and `scheduler.log` records the startup information of msrun.
Training log information is usually viewed through this folder. | Take an 8-rank task started by `msrun` as an example, the specific log structure is as follows: ```text output ├── log ├── rank_0 ├── info.log # Record the training information of NPU rank 0 └── error.log # Record the error information of NPU rank 0 ├── ... └── rank_7 ├── info.log # Record the training information of NPU rank 8 └── error.log # Record the error information of NPU rank 8 └── msrun_log ├── scheduler.log # Record the communication information between each NPU rank ├── worker_0.log # Record the training and error information of NPU rank 0 ├── ... └── worker_7.log # Record the training and error information of NPU rank 8 ``` ### Configuration and Usage By default, MindSpore Transformers specifies the file output path as `./output` in the training yaml file. If you start the training task under the `mindformers` path, the log output generated by the training will be saved under `mindformers/output` by default. #### YAML Parameter Configuration If you need to re-specify the output log folder, you can modify the configuration in yaml. Taking [`DeepSeek-V3` pre-training yaml](https://gitee.com/mindspore/docs/blob/master/docs/mindformers/docs/source_zh_cn/example/deepseek3/pretrain_deepseek3_671b.yaml) as an example, the following configuration can be made: ```yaml output_dir: './output' # path to save logs/checkpoint/strategy ``` #### Specifying Output Directory for Single-Card Tasks In addition to specifying the yaml file configuration, MindSpore Transformers also supports [run_mindformer In the one-click start script](https://www.mindspore.cn/mindformers/docs/en/master/feature/start_tasks.html#run-mindformer-one-click-start-script), use the `--output_dir` start command to specify the log output path. > If the output path is configured here, it will overwrite the configuration in the yaml file! #### Distributed Task Specifies the Output Directory If the model training requires multiple servers, use the [distributed task launch script](https://www.mindspore.cn/mindformers/docs/en/master/feature/start_tasks.html#distributed-task-pull-up-script) to start the distributed training task. If shared storage is set, you can also specify the input parameter `LOG_DIR` in the startup script to specify the log output path of the Worker and Scheduler, and output the logs of all machine nodes to one path for unified observation.