mindspore.ops.conv2d

mindspore.ops.conv2d(input, weight, bias=None, stride=1, pad_mode='valid', padding=0, dilation=1, groups=1)[source]

Applies a 2D convolution over an input tensor. The input tensor is typically of shape \((N, C_{in}, H_{in}, W_{in})\), where \(N\) is batch size, \(C\) is channel number, \(H\) is feature height, \(W\) is feature width.

The output is calculated based on formula:

\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1} \text{ccor}({\text{weight}(C_{\text{out}_j}, k), \text{X}(N_i, k)})\]

where \(bias\) is the output channel bias, \(ccor\) is the cross-correlation, \(weight\) is the convolution kernel value and \(X\) represents the input feature map.

Here are the indices' meanings:

\(i\) corresponds to the batch number, the range is \([0, N-1]\), where \(N\) is the batch size of the input.
\(j\) corresponds to the output channel, the range is \([0, C_{out}-1]\), where \(C_{out}\) is the number of output channels, which is also equal to the number of kernels.
\(k\) corresponds to the input channel, the range is \([0, C_{in}-1]\), where \(C_{in}\) is the number of input channels, which is also equal to the number of channels in the convolutional kernels.

Therefore, in the above formula, \({bias}(C_{out_j})\) represents the bias of the \(j\)-th output channel, \({weight}(C_{out_j}, k)\) represents the slice of the \(j\)-th convolutional kernel in the \(k\)-th channel, and \({X}(N_i, k)\) represents the slice of the \(k\)-th input channel in the \(i\)-th batch of the input feature map.

The shape of the convolutional kernel is given by \((\text{kernel_size[0]}, \text{kernel_size[1]})\), where \(\text{kernel_size[0]}\) and \(\text{kernel_size[1]}\) are the height and width of the kernel, respectively. If we consider the input and output channels as well as the groups parameter, the complete kernel shape will be \((C_{out}, C_{in} / \text{groups}, \text{kernel_size[0]}, \text{kernel_size[1]})\), where groups is the number of groups dividing input's input channel when applying group convolution.

For more details about convolution layer, please refer to Gradient Based Learning Applied to Document Recognition and ConvNets.

Warning

After version 2.9.0, pad_mode will be removed. The signature will change to conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1).

Note

On Ascend platform, only group convolution in depthwise convolution scenarios is supported. That is, when groups>1, condition \(C_{in}\) = \(C_{out}\) = groups must be satisfied.

Parameters:

input (Tensor) – Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).
weight (Tensor) – Tensor of shape \((N, C_{in} / \text{groups}, \text{kernel_size[0]}, \text{kernel_size[1]})\), then the size of kernel is \((\text{kernel_size[0]}, \text{kernel_size[1]})\).
bias (Tensor, optional) – Bias Tensor with shape \((C_{out})\). When bias is None , zeros will be used. Default: None .
stride (Union(int, tuple[int]), optional) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1 .
pad_mode (str, optional) –
Specifies padding mode. The optional values are "same" , "valid" and "pad" . Default: "valid" .
- same: Adopts the way of completion. The height and width of the output will be equal to the input x divided by stride. The padding will be evenly calculated in top and bottom, left and right possiblily. Otherwise, the last extra padding will be calculated from the bottom and the right side. If this mode is set, padding must be 0.
- valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded. If this mode is set, padding must be 0.
- pad: Implicit paddings on both sides of the input x. The number of padding will be padded to the input Tensor borders. padding must be greater than or equal to 0.
padding (Union(int, tuple[int], list[int]), optional) – Implicit paddings on both sides of the input x. If padding is one integer, the paddings of top, bottom, left and right are the same, equal to padding. If padding is a tuple/list with 2 integers, the padding of top and bottom is padding[0], and the padding of left and right is padding[1]. Default: 0 .
dilation (Union(int, tuple[int]), optional) – Gaps between kernel elements.The data type is int or a tuple of 2 integers. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater than or equal to 1 and bounded by the height and width of the input x. Default: 1 .
groups (int, optional) – Splits input into groups. Default: 1 .

Returns:

Tensor, the value that applied 2D convolution. The shape is \((N, C_{out}, H_{out}, W_{out})\). To see how different pad modes affect the output shape, please refer to mindspore.nn.Conv2d for more details.

Raises:

TypeError – If stride, padding or dilation is neither an int nor a tuple.
TypeError – groups is not an int.
TypeError – If bias is not a Tensor.
ValueError – If the shape of bias is not \((C_{out})\) .
ValueError – If stride or dilation is less than 1.
ValueError – If pad_mode is not one of 'same', 'valid' or 'pad'.
ValueError – If padding is a tuple/list whose length is not equal to 2.
ValueError – If pad_mode is not equal to 'pad' and padding is greater than 0.

Supported Platforms:: Ascend GPU

Examples

>>> import mindspore
>>> import numpy as np
>>> from mindspore import Tensor, ops
>>> x = Tensor(np.ones([10, 32, 32, 32]), mindspore.float32)
>>> weight = Tensor(np.ones([32, 32, 3, 3]), mindspore.float32)
>>> output = ops.conv2d(x, weight)
>>> print(output.shape)
(10, 32, 30, 30)