mindspore.dataset.audio

此模块用于音频数据增强,包括 transformsutils 两个子模块。 transforms 是一个高性能音频数据增强模块,支持常见的音频数据增强操作。 utils 提供了一些音频处理的工具方法。

API样例中常用的导入模块如下:

import mindspore.dataset as ds
import mindspore.dataset.audio as audio

常用数据处理术语说明如下:

  • TensorOperation,所有C++实现的数据处理操作的基类。

  • AudioTensorOperation,所有音频数据处理操作的基类,派生自TensorOperation。

数据增强算子可以放入数据处理Pipeline中执行,也可以Eager模式执行:

  • Pipeline模式一般用于处理数据集,示例可参考 数据处理Pipeline介绍

  • Eager模式一般用于零散样本,音频预处理举例如下:

    import numpy as np
    import mindspore.dataset.audio as audio
    from mindspore.dataset.audio import ResampleMethod
    
    # 音频输入
    waveform = np.random.random([1, 30])
    
    # 增强操作
    resample_op = audio.Resample(orig_freq=48000, new_freq=16000,
                                 resample_method=ResampleMethod.SINC_INTERPOLATION,
                                 lowpass_filter_width=6, rolloff=0.99, beta=None)
    waveform_resampled = resample_op(waveform)
    print("waveform reampled: {}".format(waveform_resampled), flush=True)
    

变换

mindspore.dataset.audio.AllpassBiquad

给音频波形施加双极点全通滤波器,其中心频率和带宽由入参指定。

mindspore.dataset.audio.AmplitudeToDB

将输入音频从振幅/功率标度转换为分贝标度。

mindspore.dataset.audio.Angle

计算复数序列的角度。

mindspore.dataset.audio.BandBiquad

给音频波形施加双极点带通滤波器。

mindspore.dataset.audio.BandpassBiquad

给音频波形施加双极点巴特沃斯(Butterworth)带通滤波器。

mindspore.dataset.audio.BandrejectBiquad

给音频波形施加双极点巴特沃斯(Butterworth)带阻滤波器。

mindspore.dataset.audio.BassBiquad

给音频波形施加低音控制效果,即双极点低频搁架滤波器。

mindspore.dataset.audio.Biquad

Perform a biquad filter of input audio.

mindspore.dataset.audio.ComplexNorm

计算复数序列的范数。

mindspore.dataset.audio.ComputeDeltas

Compute delta coefficients of a spectrogram.

mindspore.dataset.audio.Contrast

给音频波形施加对比度增强效果。

mindspore.dataset.audio.DBToAmplitude

Turn a waveform from the decibel scale to the power/amplitude scale.

mindspore.dataset.audio.DCShift

Apply a DC shift to the audio.

mindspore.dataset.audio.DeemphBiquad

Design two-pole deemph filter for audio waveform of dimension of (..., time).

mindspore.dataset.audio.DetectPitchFrequency

Detect pitch frequency.

mindspore.dataset.audio.Dither

Dither increases the perceived dynamic range of audio stored at a particular bit-depth by eliminating nonlinear truncation distortion.

mindspore.dataset.audio.EqualizerBiquad

Design biquad equalizer filter and perform filtering.

mindspore.dataset.audio.Fade

Add a fade in and/or fade out to an waveform.

mindspore.dataset.audio.Flanger

Apply a flanger effect to the audio.

mindspore.dataset.audio.FrequencyMasking

给音频波形施加频域掩码。

mindspore.dataset.audio.Gain

Apply amplification or attenuation to the whole waveform.

mindspore.dataset.audio.GriffinLim

Approximate magnitude spectrogram inversion using the GriffinLim algorithm.

mindspore.dataset.audio.HighpassBiquad

Design biquad highpass filter and perform filtering.

mindspore.dataset.audio.InverseMelScale

Solve for a normal STFT form a mel frequency STFT, using a conversion matrix.

mindspore.dataset.audio.LFilter

Design two-pole filter for audio waveform of dimension of (..., time).

mindspore.dataset.audio.LowpassBiquad

给音频波形施加双极点低通滤波器。

mindspore.dataset.audio.Magphase

Separate a complex-valued spectrogram with shape (..., 2) into its magnitude and phase.

mindspore.dataset.audio.MaskAlongAxis

Apply a mask along axis.

mindspore.dataset.audio.MaskAlongAxisIID

Apply a mask along axis.

mindspore.dataset.audio.MelScale

Convert normal STFT to STFT at the Mel scale.

mindspore.dataset.audio.MuLawDecoding

Decode mu-law encoded signal.

mindspore.dataset.audio.MuLawEncoding

Encode signal based on mu-law companding.

mindspore.dataset.audio.Overdrive

Apply overdrive on input audio.

mindspore.dataset.audio.Phaser

Apply a phasing effect to the audio.

mindspore.dataset.audio.PhaseVocoder

Given a STFT tensor, speed up in time without modifying pitch by a factor of rate.

mindspore.dataset.audio.Resample

Resample a signal from one frequency to another.

mindspore.dataset.audio.RiaaBiquad

Apply RIAA vinyl playback equalization.

mindspore.dataset.audio.SlidingWindowCmn

Apply sliding-window cepstral mean (and optionally variance) normalization per utterance.

mindspore.dataset.audio.SpectralCentroid

Create a spectral centroid from an audio signal.

mindspore.dataset.audio.Spectrogram

Create a spectrogram from an audio signal.

mindspore.dataset.audio.TimeMasking

给音频波形施加时域掩码。

mindspore.dataset.audio.TimeStretch

以给定的比例拉伸音频短时傅里叶(Short Time Fourier Transform, STFT)频谱的时域,但不改变音频的音高。

mindspore.dataset.audio.TrebleBiquad

Design a treble tone-control effect.

mindspore.dataset.audio.Vad

Attempt to trim silent background sounds from the end of the voice recording.

mindspore.dataset.audio.Vol

Apply amplification or attenuation to the whole waveform.

工具

mindspore.dataset.audio.BorderType

Padding Mode, BorderType Type.

mindspore.dataset.audio.DensityFunction

Density Functions.

mindspore.dataset.audio.FadeShape

Fade Shapes.

mindspore.dataset.audio.GainType

Gain Types.

mindspore.dataset.audio.Interpolation

Interpolation Type.

mindspore.dataset.audio.MelType

Mel Types.

mindspore.dataset.audio.Modulation

Modulation Type.

mindspore.dataset.audio.NormMode

Norm Types.

mindspore.dataset.audio.NormType

Norm Types.

mindspore.dataset.audio.ResampleMethod

Resample method

mindspore.dataset.audio.ScaleType

音频标度枚举类。

mindspore.dataset.audio.WindowType

Window Function types,

mindspore.dataset.audio.create_dct

Create a DCT transformation matrix with shape (n_mels, n_mfcc), normalized depending on norm.

mindspore.dataset.audio.melscale_fbanks

Create a frequency transformation matrix with shape (n_freqs, n_mels).