mindformers.core.WarmUpStableDecayLR

View Source On Gitee
class mindformers.core.WarmUpStableDecayLR(learning_rate: float, lr_end: float = 1e-7, warmup_steps: int = None, warmup_lr_init: float = 0., warmup_ratio: float = None, total_steps: int = None, decay_start_steps: int = None, decay_start_ratio: float = None, **kwargs)[source]

Warm Up Stable Decay Learning Rate.

This learning rate scheduler consists of three phases:

  1. Warm-up Phase: The learning rate increases linearly from the initial value warmup_lr_init to the base learning rate learning_rate.

  2. Steady Phase: The learning rate remains constant at the base value.

  3. Decay Phase: The learning rate decreases linearly from learning_rate to the final value lr_end.

Warm-up Phase Formula:

\[\eta\_t = \eta\_{\text{warmup}} + t \times \frac{\eta\_{\text{base}} - \eta\_{\text{warmup}}}{\text{warmup_steps}}\]

Where:

  • \(\eta_{\text{warmup}}\) is the initial warm-up learning rate (warmup_lr_init)

  • \(\eta_{\text{base}}\) is the base learning rate (learning_rate)

  • \(t\) is the current step (not exceeding warmup_steps)

Decay Phase Formula:

\[\eta\_t = \eta\_{\text{base}} - (\eta\_{\text{base}} - \eta\_{\text{end}}) \times \frac{t - T\_{\text{decay_start}}}{T\_{\text{decay_steps}}}\]

Where:

  • \(\eta_{\text{end}}\) is the final learning rate (lr_end)

  • \(T_{\text{decay_start}}\) is the step at which decay begins (decay_start_steps)

  • \(T_{\text{decay_steps}}\) is the total number of decay steps (total_steps - decay_start_steps)

Parameters
  • learning_rate (float) – Initial value of learning rate.

  • lr_end (float, optional) – Final value of learning rate. Default: 1e-7.

  • warmup_steps (int, optional) – The number of warm up steps. Default: None.

  • warmup_lr_init (float, optional) – Initial learning rate in warm up steps. Default: 0..

  • warmup_ratio (float, optional) – Ratio of total training steps used for warmup. Default: None.

  • total_steps (int, optional) – The number of total steps. Default: None.

  • decay_start_steps (int, optional) – The start step of decay. Default: None.

  • decay_start_ratio (float, optional) – Ratio of total training steps used for decay. Default: None.

Inputs:
  • global_step (int) - The global step.

Outputs:

Learning rate.

Raises

ValueError – If lr_end is greater than or equal to initial learning_rate.