mindformers.core.CosineWithWarmUpLR

class mindformers.core.CosineWithWarmUpLR(learning_rate, warmup_steps=0, total_steps=None, num_cycles=0.5, lr_end=0., warmup_lr_init=0., warmup_ratio=None, decay_steps=None, decay_ratio=None, **kwargs)[source]

Cosine with Warm Up Learning Rate.

The CosineWithWarmUpLR learning rate scheduler applies a cosine annealing schedule with warm-up steps to set the learning rate for each parameter group. Initially, the learning rate increases linearly during the warm-up phase, after which it follows a cosine function to decay.

During the warm-up phase, the learning rate increases from a small initial value to the base learning rate as follows:

\[\eta_t = \eta_{\text{warmup}} + t \times \frac{\eta_{\text{base}} - \eta_{\text{warmup}}}{\text{warmup_steps}}\]

where \(\eta_{\text{warmup}}\) is the initial learning rate, and \(\eta_{\text{base}}\) is the learning rate after the warm-up phase.

once the warm-up phase is completed, the learning rate follows a cosine decay schedule:

\[\eta_t = \eta_{\text{end}} + \frac{1}{2}(\eta_{\text{base}} - \eta_{\text{end}})\left(1 + \cos\left(\frac{t_{cur}}{t_{max}}\pi\right)\right)\]

where \(t_{cur}\) is the number of epochs since the end of the warm-up phase, and \(t_{max}\) is the total number of epochs until the next restart.

Parameters

learning_rate (float) – Learning rate after the warm-up phase.
warmup_steps (int, optional) – The number of warm up steps. Default: None.
total_steps (int, optional) – The number of total steps. Default: None.
num_cycles (float, optional) – The number of waves in the cosine schedule (the defaults is to just decrease from the max value to 0 following a half-cosine). Default: 0.5.
lr_end (float, optional) – Final value of learning rate. Default: 0..
warmup_lr_init (float, optional) – Initial learning rate in warm up steps. Default: 0..
warmup_ratio (float, optional) – Ratio of total training steps used for warmup. Default: None.
decay_steps (int, optional) – The number of decay steps. Default: None.
decay_ratio (float, optional) – Ratio of total training steps used for decay. Default: None.

Inputs:

global_step (Tensor) - The global step.

Outputs:

Learning rate.

Examples

>>> import mindspore as ms
>>> from mindformers.core import CosineWithWarmUpLR
>>>
>>> ms.set_context(mode=ms.GRAPH_MODE)
>>> total_steps = 20
>>> warmup_steps = 10
>>> learning_rate = 0.005
>>>
>>> cosine_warmup = CosineWithWarmUpLR(learning_rate=learning_rate,
...                                    warmup_steps=warmup_steps,
...                                    total_steps=total_steps)
>>> print(cosine_warmup(ms.Tensor(1)))
0.0005
>>> print(cosine_warmup(ms.Tensor(15)))
0.0024999997