mindformers.core.MFLossMonitor
- class mindformers.core.MFLossMonitor(learning_rate=None, per_print_times=1, micro_batch_num=1, micro_batch_interleave_num=1, origin_epochs=None, dataset_size=None, initial_epoch=0, initial_step=0, global_batch_size=0, gradient_accumulation_steps=1, check_for_nan_in_loss_and_grad=False, calculate_per_token_loss=False, print_separate_loss=False, **kwargs)[source]
Monitor loss and other parameters in training process.
- Parameters
learning_rate (Union[float, LearningRateSchedule], optional) – The learning rate schedule. Default:
None
.per_print_times (int, optional) – Every how many steps to print the log information. Default:
1
.micro_batch_num (int, optional) – MicroBatch size for Pipeline Parallel. Default:
1
.micro_batch_interleave_num (int, optional) – split num of batch size. Default:
1
.origin_epochs (int, optional) – Training epoches. Default:
None
.dataset_size (int, optional) – Training dataset size. Default:
None
.initial_epoch (int, optional) – The beginning epoch. Default:
0
.initial_step (int, optional) – The beginning step. Default:
0
.global_batch_size (int, optional) – The total batch size. Default:
0
.gradient_accumulation_steps (int, optional) – The gradient accumulation steps. Default:
1
.check_for_nan_in_loss_and_grad (bool, optional) – Whether to check loss and norm of grad is Nan. Default:
False
.calculate_per_token_loss (bool, optional) – Whether to calculate the loss of each token. Default:
False
.print_separate_loss (bool, optional) – Whether to print loss separately. Default:
False
.
Examples
>>> from mindformers.core import MFLossMonitor >>> lr = [0.01, 0.008, 0.006, 0.005, 0.002] >>> monitor = MFLossMonitor(learning_rate=lr, per_print_times=10)