mindscience.diffuser.DDIMScheduler

class mindscience.diffuser.DDIMScheduler(num_train_timesteps=1000, beta_start=0.0001, beta_end=0.02, beta_schedule='squaredcos_cap_v2', prediction_type='epsilon', clip_sample=True, clip_sample_range=1.0, thresholding=False, sample_max_value=1.0, dynamic_thresholding_ratio=0.995, rescale_betas_zero_snr=False, timestep_spacing='leading', compute_dtype=mstype.float32)[source]

DDIMScheduler extends the denoising procedure introduced in denoising diffusion probabilistic models. Check Denoising Diffusion Implicit Models for more information.

Parameters

num_train_timesteps (int, optional) – The number of diffusion steps to train the model. Default: 1000.
beta_start (float, optional) – The starting beta value of inference. Default: 0.0001.
beta_end (float, optional) – The final beta value. Default: 0.02.
beta_schedule (str, optional) – The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from "linear", "scaled_linear" or "squaredcos_cap_v2". Default: "squaredcos_cap_v2".
prediction_type (str, optional) – Prediction type of the scheduler function; can be "epsilon" (predicts the noise of the diffusion process), "sample" (directly predicts the noisy sample) or "v_prediction" (see section 2.4 of Imagen Video paper). Default: "epsilon".
clip_sample (bool, optional) – Clip the predicted sample for numerical stability. Default: True.
clip_sample_range (float, optional) – The maximum magnitude for sample clipping. Valid only when clip_sample=True. Default: 1.0.
thresholding (bool, optional) – Whether to use the "dynamic thresholding" method. This is unsuitable for latent-space diffusion models such as Stable Diffusion. Default: False.
sample_max_value (float, optional) – The threshold value for dynamic thresholding. Valid only when thresholding=True. Default: 1.0.
dynamic_thresholding_ratio (float, optional) – The ratio for the dynamic thresholding method. Valid only when thresholding=True. Default: 0.995.
rescale_betas_zero_snr (bool, optional) – Whether to rescale the betas to have zero terminal SNR. This enables the model to generate very bright and dark samples instead of limiting it to samples with medium brightness. Loosely related to offset_noise. Default: False.
timestep_spacing (str, optional) – The way the timesteps should be scaled. Refer to Table 2 of the Common Diffusion Noise Schedules and Sample Steps are Flawed for more information. Choose from "linspace", "leading" or "trailing". Default: "leading".
compute_dtype (mindspore.dtype, optional) – The dtype of compute, it can be mstype.float32 or mstype.float16. Default: mstype.float32, indicates mindspore.float32.

Examples

>>> from mindspore import ops, dtype as mstype
>>> from mindscience.diffuser import DDIMScheduler
>>> scheduler = DDIMScheduler(num_train_timesteps=1000,
...                           beta_start=0.0001,
...                           beta_end=0.02,
...                           beta_schedule="squaredcos_cap_v2",
...                           prediction_type='epsilon',
...                           clip_sample=True,
...                           clip_sample_range=1.0,
...                           thresholding=False,
...                           sample_max_value=1.,
...                           dynamic_thresholding_ratio=0.995,
...                           rescale_betas_zero_snr=False,
...                           timestep_spacing="leading",
...                           compute_dtype=mstype.float32)
>>> batch_size, seq_len, in_dim = 4, 256, 16
>>> original_samples = ops.randn([batch_size, seq_len, in_dim])
>>> noise = ops.randn([batch_size, seq_len, in_dim])
>>> timesteps = ops.randint(0, 100, [batch_size, 1])
>>> noised_sample = scheduler.add_noise(original_samples, noise, timesteps)
>>> print(noised_sample.shape)
(4, 256, 16)
>>> sample_timesteps = Tensor(np.array([60]*batch_size), dtype=mstype.int32)
>>> x_prev = scheduler.step(noise, noised_sample, sample_timesteps)
>>> print(x_prev.shape)
(4, 256, 16)

step(model_output, sample, timestep, eta=0.0, use_clipped_model_output=False)[source]

DDIM denoising step.

Parameters

model_output (Tensor) – The direct output from learned diffusion model.
sample (Tensor) – A current instance of a sample created by the diffusion process.
timestep (Tensor) – The current discrete timestep in the diffusion chain.
eta (float, optional) – The weight of noise for added noise in diffusion step. DDIM when eta=0, DDPM when eta=1. Default: 0.0.
use_clipped_model_output (bool, optional) – Controls whether to recompute the noise epsilon from the clipped predicted original sample (x_0) to compensate for bias introduced by clip_sample. This correction is applied only during sampling. If True, derive epsilon from the clipped x_0 and use the corrected noise for the denoising step, improving stability when x_0 clipping would otherwise skew the update. If False, use the raw model_output directly without this correction, preserving the model's unadjusted prediction. Default: False.

Returns

Tensor, the sample for the previous diffusion step.

Raises

ValueError – If eta not in \([0, 1]\).