{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "60f63ba4",
   "metadata": {},
   "source": [
    "# MindSpore Hybrid 语法规范\n",
    "\n",
    "[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_notebook.svg)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/master/tutorials/experts/zh_cn/operation/mindspore_ms_kernel.ipynb)&emsp;[![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_download_code.svg)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/master/tutorials/experts/zh_cn/operation/mindspore_ms_kernel.py)&emsp;[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/experts/source_zh_cn/operation/ms_kernel.ipynb)\n",
    "\n",
    "## 概述\n",
    "\n",
    "MindSpore Hybrid DSL的语法与Python语法类似，例如函数定义、缩进和注释。把MindSpore Hybrid DSL书写的函数加上`kernel`装饰器后可以当做普通的`numpy`函数使用，也可以用于Custom的进行自定义算子。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "50190822",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[-0.7582229   1.9742808  -1.5035899   1.6295254 ]\n",
      " [ 0.18717238 -1.1390371  -0.92540735  0.25755903]\n",
      " [-0.75234073  0.2182185   0.9805498   0.27473617]\n",
      " [ 0.7546873  -0.8488003   0.58964515 -0.23971215]]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[-0.758223    1.9742805  -1.5035899   1.6295254 ]\n",
      " [ 0.18717244 -1.1390371  -0.9254071   0.2575591 ]\n",
      " [-0.7523403   0.21821874  0.9805499   0.27473587]\n",
      " [ 0.75468683 -0.84879947  0.5896454  -0.23971221]]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import mindspore as ms\n",
    "import mindspore.ops as ops\n",
    "from mindspore.ops import kernel\n",
    "\n",
    "@kernel\n",
    "def outer_product(a, b):\n",
    "    d = allocate(a.shape, a.dtype)\n",
    "    c = output_tensor(a.shape, a.dtype)\n",
    "\n",
    "    for i0 in range(a.shape[0]):\n",
    "        for i1 in range(b.shape[1]):\n",
    "            c[i0, i1] = 0.0\n",
    "            for i2 in range(a.shape[1]):\n",
    "                d[i0, i2] = 2 * a[i0, i2]\n",
    "                c[i0, i1] = c[i0, i1] + sin(d[i0, i2] * b[i2, i1])\n",
    "    return c\n",
    "\n",
    "np_x = np.random.normal(0, 1, [4, 4]).astype(np.float32)\n",
    "np_y = np.random.normal(0, 1, [4, 4]).astype(np.float32)\n",
    "\n",
    "print(outer_product(np_x, np_y))\n",
    "\n",
    "input_x = ms.Tensor(np_x)\n",
    "input_y = ms.Tensor(np_y)\n",
    "\n",
    "test_op_akg = ops.Custom(outer_product)\n",
    "out = test_op_akg(input_x, input_y)\n",
    "print(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a80bae5",
   "metadata": {},
   "source": [
    "## 语法规则\n",
    "\n",
    "### 变量\n",
    "\n",
    "MindSpore Hybrid DSL中的变量包括Tensor和Scalar两种形式。\n",
    "\n",
    "对于Tensor类型的变量，除了在输入中提供的变量，其他变量都需要在使用前申明 `shape`和 `dtype`。\n",
    "\n",
    "- 对于输出Tensor使用 `output_tensor`，用法为：`output_tensor(shape, dtype)`。\n",
    "- 对于中间结果使用 `allocate`，用法为：`allocate(shape, dtype)`。\n",
    "\n",
    "Tensor分配的示例代码如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c8227ab5",
   "metadata": {},
   "outputs": [],
   "source": [
    "@kernel\n",
    "def kernel_func(a, b):\n",
    "    # a和b作为输入tensor，可以直接使用\n",
    "\n",
    "    # d为一个数据类型为fp16,形状为(2,)的Tensor，在下面的code中作为中间变量使用\n",
    "    d = allocate((2,), \"float16\")\n",
    "    # c为一个数据类型与b相同,形状与a相同的Tensor，在下面的code中作为函数输出使用\n",
    "    c = output_tensor(a.shape, b.dtype)\n",
    "\n",
    "    # d作为中间变量，给c赋值\n",
    "    d[0] = b[0, 0]\n",
    "    for i in range(4):\n",
    "        for j in range(4):\n",
    "            c[i, j] = d[0]\n",
    "\n",
    "    # c作为输出\n",
    "    return c"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "afb59a87",
   "metadata": {},
   "source": [
    "对于Scalar类变量，会将它第一次的赋值运算作为声明。赋值操作可以是一个立即数，也可以是一个计算表达式。Scalar类变量第一次赋值的地方决定了它的定义域（例如，某一个for loop之内），在定义域之外使用Scalar变量会报错。\n",
    "\n",
    "Scalar变量使用的示例代码如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "905699c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "@kernel\n",
    "def kernel_func(a):\n",
    "    c = output_tensor(a.shape, a.dtype)\n",
    "\n",
    "    for i in range(10): # i loop\n",
    "        for j in range(5): # j loop\n",
    "            # 用一个立即数给Scalar赋值\n",
    "            d = 2.0\n",
    "            # 用表达式给Scalar赋值\n",
    "            e = a[i, j]\n",
    "            # 正常使用scalar\n",
    "            c[i, j] = d + e\n",
    "\n",
    "    # Wrong: c[0, 0] = d\n",
    "    # 不能在超出Scalar d的定义域（j loop）之外的范围使用\n",
    "\n",
    "    return c"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09bef090",
   "metadata": {},
   "source": [
    "与原生Python语言不同的是，变量一旦创建，`shape`和 `dtype`就不能改变。\n",
    "\n",
    "### 计算表达\n",
    "\n",
    "MindSpore Hybrid DSL支持基本的四则运算表达，包括 `+, -, *, /`，及赋值运算符，包括 `=, +=, -=, *=, /=`。\n",
    "用户可以像写Python表达一样书写计算表达式利用变量计算和为变量赋值。\n",
    "\n",
    "所有的计算需要基于标量计算，如果是Tensor对象那么写清楚所有index，即 `C[i, j] = A[i, j] + B[i, j]`。当前不支持 `C = A + B`这种向量化的写法。\n",
    "\n",
    "在书写计算表达式时，用户需要自行负责类型的合法性。表达式左右两边的类型需要保持一致，否则在**算子编译环节**会报错。计算式中的整数立即数会被认定为int32，而浮点立即数会被认定为float32。MindSpore Hybrid DSL不提供任何隐式的类型转化，所有类型转化都需要显式的书写出来。类型名即对应类型转换函数的名字，包括：\n",
    "\n",
    "- int32\n",
    "- float16\n",
    "- float32\n",
    "- (仅GPU后端)int8, int16, int64, float64\n",
    "\n",
    "类型转换代码示例如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4ce9278f",
   "metadata": {},
   "outputs": [],
   "source": [
    "@kernel\n",
    "def kernel_func(a):\n",
    "    c = output_tensor((2,), \"float16\")\n",
    "\n",
    "    # Wrong: c[0] = 0.1 此处c的类型为fp16, 而0.1的类型为fp32\n",
    "    c[0] = float16(0.1) # float16(0.1)把表达式的类型转化为fp16\n",
    "    c[1] = float16(a[0, 0]) # float16(a[0, 0])把表达式的类型转化为fp16\n",
    "\n",
    "    return c"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5df9dbd",
   "metadata": {},
   "source": [
    "### 循环\n",
    "\n",
    "当前只支持 `for` loop，不支持 `while`、 `break`、 `continue`关键词。\n",
    "\n",
    "基本循环的写法和Python一样，循环维度的表达可以使用 `range`和 `grid`关键词。`range`表示一维的循环维度，接受一个参数表示循环的上限，例如："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c14cc217",
   "metadata": {},
   "outputs": [],
   "source": [
    "@kernel\n",
    "def kernel_func(a, b):\n",
    "    c = output_tensor((3, 4, 5), \"float16\")\n",
    "\n",
    "    for i in range(3):\n",
    "        for j in range(4):\n",
    "            for k in range(5):\n",
    "                out[i, j, k] = a[i, j, k] + b[i, j, k]\n",
    "    return  c"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7b78295",
   "metadata": {},
   "source": [
    "则循环表达的计算空间为 `0 <= i < 3, 0 <= j < 4, 0 <= k < 5`。\n",
    "\n",
    "`grid`表示多维网格，接受的输入为 `tuple` ，例如上面的代码用 `grid`表达后如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "bdcb8ae0",
   "metadata": {},
   "outputs": [],
   "source": [
    "@kernel\n",
    "def kernel_func(a, b):\n",
    "    c = output_tensor((3, 4, 5), \"float16\")\n",
    "\n",
    "    for arg in grid((4, 5, 6)):\n",
    "        out[arg] = a[arg] + b[arg]\n",
    "    return  c"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0fb4930",
   "metadata": {},
   "source": [
    "此时，参数 `arg`等价于一个三维index `(i,j,k)`，其上限分别为4，5，6。对参数 `arg`我们可以取其中的某个分量，例如"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "1fc0010f",
   "metadata": {},
   "outputs": [],
   "source": [
    "@kernel\n",
    "def kernel_func(a, b):\n",
    "    c = output_tensor((3, 4, 5), \"float16\")\n",
    "\n",
    "    for arg in grid((4, 5, 6)):\n",
    "        out[arg] = a[arg] + b[arg[0]]\n",
    "    return  c"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00679875",
   "metadata": {},
   "source": [
    "那么循环内的表达式等价于 `out[i, j, k] = a[i, j, k] + b[i]`。\n",
    "\n",
    "### 调度原语\n",
    "\n",
    "从1.8版本开始，MindSpore Hybrid DSL 提供调度原语以描述不同类型的循环。在 Ascend 后端，调度原语将协助新 DSA 多面体调度器生成代码。此类调度原语包括：`serial`， `vectorize`， `parallel`，和 `reduce`。\n",
    "\n",
    "`serial` 会提示调度器该循环在调度生成时应保持前后顺序，不要做会改变顺序的调度变换，例如："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "12f280b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "@kernel\n",
    "def serial_test(a, b):\n",
    "    row = a.shape[0]\n",
    "    for i in serial(row):\n",
    "        for j in serial(i):\n",
    "            b[i] = b[i] - a[i, j] * b[j]\n",
    "    return b"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2066b403",
   "metadata": {},
   "source": [
    "这里 `serial` 提示 `i` 和 `j` 的计算有依赖关系，调度时应保持 `i` 和 `j` 从小的大的顺序。\n",
    "\n",
    "`vectorize` 一般用于最内层循环，会提示调度器该循环有生成向量化指令的机会，例如："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "723fcc54",
   "metadata": {},
   "outputs": [],
   "source": [
    "@kernel\n",
    "def vector_test(a, b):\n",
    "    out = output_tensor(a.shape, a.dtype)\n",
    "    row = a.shape[0]\n",
    "    col = a.shape[1]\n",
    "    for i in range(row):\n",
    "        for j in vectorize(col):\n",
    "            out[i, j] = a[i, j] + b[0, i]\n",
    "    return out"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aeac324a",
   "metadata": {},
   "source": [
    "这里 `vectorize` 提示最内层 `j` 轴循环包含同质化计算，调度时可以生成向量化指令加速内层循环。\n",
    "\n",
    "`parallel` 一般用于最外层循环，会提示调度器该循环有并行执行机会，例如："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "0a8ffcf8",
   "metadata": {},
   "outputs": [],
   "source": [
    "@kernel\n",
    "def parallel_test(a, b):\n",
    "    out = output_tensor(a.shape, a.dtype)\n",
    "    row = a.shape[0]\n",
    "    col = a.shape[1]\n",
    "    for i in parallel(row):\n",
    "        for j in range(col):\n",
    "            out[i, j] = a[i, j] + b[0, j]\n",
    "    return out"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcad8e24",
   "metadata": {},
   "source": [
    "这里 `parallel` 提示最外层 `i` 轴循环无依赖关系，调度时可以并行加速。\n",
    "\n",
    "`reduce` 会提示调度器该循环为运算中的一个 Reduction 轴，例如："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "b0229b5b",
   "metadata": {},
   "outputs": [],
   "source": [
    "def reduce_test(a):\n",
    "    out = output_tensor((a.shape[0], ), a.dtype)\n",
    "    row = a.shape[0]\n",
    "    col = a.shape[1]\n",
    "    for i in range(row):\n",
    "        out[i] = 0.0\n",
    "        for k in reduce(col):\n",
    "            out[i] = out[i] + a[i, k]\n",
    "    return out"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "729471ae",
   "metadata": {},
   "source": [
    "这里 `reduce` 对应的 `k` 轴为累加轴。\n",
    "\n",
    "用户在使用调度原语的时候需要注意：\n",
    "\n",
    "- 上述调度原语只会在 Ascend 后端影响调度。在CPU和GPU后端，上述调度原语将被处理成普通的 `for` 循环关键词。\n",
    "- 调度原语对于调度器只是提示作用，当调度原语的提示和调度器自身的分析验证相矛盾时，调度器将把上述调度原语将被处理成普通的 `for` 循环关键词。\n",
    "\n",
    "### 属性\n",
    "\n",
    "当前只支持对Tensor对象属性shape和dtype，例如 `a.shape`，`c.dtype`。\n",
    "\n",
    "一个Tensor的shape属性会表达为一个 `tuple`，我们可以对它进行**固定**下标的取分量操作，例如 `a.shape[0]`。\n",
    "\n",
    "同时，在 `grid`关键词中我们接受某个Tensor对象的 `shape`属性，那么循环的维度由Tensor的维度决定。例如："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "048a0931",
   "metadata": {},
   "outputs": [],
   "source": [
    "@kernel\n",
    "def kernel_func(a, b):\n",
    "    c = output_tensor(a.shape, \"float16\")\n",
    "\n",
    "    for arg in grid(a.shape):\n",
    "        out[arg] = a[arg] + b[arg[0]]\n",
    "    return  c"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5775c852",
   "metadata": {},
   "source": [
    "如果a是一个二维Tensor，那么循环内的表达式等价于 `out[i, j] = a[i, j] + b[i]`。而如果a是一个三维Tensor，那么循环内的表达式等价于 `out[i, j, k] = a[i, j, k] + b[i]`。\n",
    "\n",
    "### 关键词\n",
    "\n",
    "当前支持的关键词包括\n",
    "\n",
    "- 全平台支持数学函数：`log`、`exp`、`sqrt`、`tanh`、`power`、`floor`\n",
    "- 内存分配：`allocate`、 `output_tensor`\n",
    "- 数据类型转化：`int32`、 `float16`、 `float32`、 `float64`\n",
    "- 循环表达：`for`、 `range`、 `grid`\n",
    "- 调度源语：`serial`、 `vec`、 `parallel`、 `reduce`\n",
    "- 在当前版本中，我们对CPU/GPU后端提供部分进阶关键词：\n",
    "    - 数学函数：`rsqrt`、 `erf`、 `isnan`、 `sin`、 `cos`、 `isinf`、 `isfinite`、 `atan`、 `atan2`(仅GPU)、 `expm1`(仅GPU)、 `floor`、 `ceil`、 `trunc`、 `round`、 `ceil_div`\n",
    "    - 数据类型转换：`int8`，`int16`，`int64`\n",
    "\n",
    "## 常见报错信息及错误归因\n",
    "\n",
    "为了帮助用户高效地开发和定位bug，MindSpore Hybrid DSL 提供如下报错信息，包括\n",
    "\n",
    "- TypeError: 当使用了`while`, `break` 和 `continue` 等 MindSpore Hybrid DSL 不支持的 Python 关键词。\n",
    "- ValueError:\n",
    "    - 使用了不属于上面的内置函数名；\n",
    "    - 对张量取非 `shape` 或者 `dtype` 的属性。\n",
    "- 其他常见报错：\n",
    "    - “SyntaxError”: 写的 DSL 不符合基本 Python 语法（非上面的进阶用法中定义的MindSpore Hybrid DSL语法），由 Python 解释器本身报错；\n",
    "    - “ValueError: Compile error”及“The pointer\\[kernel_mod\\] is null”: Python DSL符合语法但是编译失败，由 AKG 报错，具体错误原因检查 AKG 相关报错信息；\n",
    "    - “Launch graph failed”: Python DSL符合语法，编译成功但是运行失败。具体原因参考硬件的报错信息。例如在昇腾芯片上遇到运行失败时，MindSpore 端会显示 “Ascend error occurred” 及对应硬件报错信息。\n",
    "\n",
    "## 开发用例：利用hybrid类型的自定义算子实现三维张量的加法函数\n",
    "\n",
    "首先，我们写一个基于MindSpore Hybrid DSL书写一个计算三维张量相加的函数。\n",
    "\n",
    "注意：\n",
    "\n",
    "- 对于输出张量使用 `output_tensor`，用法为：`output_tensor(shape, dtype)`；\n",
    "- 所有的计算需要基于标量计算，如果是Tensor对象，那么需要写清楚所有index；\n",
    "- 基本循环的写法和Python一样，循环维度的表达可以使用 `range`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "a3f2e1dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from mindspore import ops\n",
    "import mindspore as ms\n",
    "from mindspore.ops import kernel\n",
    "\n",
    "ms.set_context(device_target=\"GPU\")\n",
    "@kernel\n",
    "def tensor_add_3d(x, y):\n",
    "    result = output_tensor(x.shape, x.dtype)\n",
    "    #    1. 你需要一个三层循环\n",
    "    #    2. 第i层循环的上界可以用x.shape[i]获得\n",
    "    #    3. 你需要基于每个元素表达计算，例如加法为 x[i, j, k] + y[i, j, k]\n",
    "    for i in range(x.shape[0]):\n",
    "        for j in range(x.shape[1]):\n",
    "            for k in range(x.shape[2]):\n",
    "                result[i, j, k] = x[i, j, k] + y[i, j, k]\n",
    "\n",
    "    return result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b0f21fbb",
   "metadata": {},
   "source": [
    "下面我们用上面的函数自定义一个算子。\n",
    "\n",
    "注意到基于`kernel`的`hybrid`函数时，我们可以使用自动的形状和数据类型推导。\n",
    "\n",
    "因此我们只用给一个`func`输入（`func_type`的默认值为`\"hybrid\"`）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "29c2be46",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[[3. 3. 3. 3.]\n",
      "  [3. 3. 3. 3.]\n",
      "  [3. 3. 3. 3.]]\n",
      "\n",
      " [[3. 3. 3. 3.]\n",
      "  [3. 3. 3. 3.]\n",
      "  [3. 3. 3. 3.]]]\n"
     ]
    }
   ],
   "source": [
    "tensor_add_3d_op = ops.Custom(func=tensor_add_3d)\n",
    "input_tensor_x = ms.Tensor(np.ones([2, 3, 4]).astype(np.float32))\n",
    "input_tensor_y = ms.Tensor(np.ones([2, 3, 4]).astype(np.float32) * 2)\n",
    "result_cus = tensor_add_3d_op(input_tensor_x, input_tensor_y)\n",
    "print(result_cus)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e08512c7",
   "metadata": {},
   "source": [
    "同时我们可以使用`pyfunc`模式验证上面定义的正确性。\n",
    "\n",
    "这里我们不需要重新定义算子计算函数`tensor_add_3d`，直接将`func_type`改为`\"pyfunc\"`即可。\n",
    "\n",
    "注意`pyfunc`模式时我们需要手写类型推导函数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "37d9313f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[[3. 3. 3. 3.]\n",
      "  [3. 3. 3. 3.]\n",
      "  [3. 3. 3. 3.]]\n",
      "\n",
      " [[3. 3. 3. 3.]\n",
      "  [3. 3. 3. 3.]\n",
      "  [3. 3. 3. 3.]]]\n"
     ]
    }
   ],
   "source": [
    "def infer_shape_py(x, y):\n",
    "    return x\n",
    "\n",
    "def infer_dtype_py(x, y):\n",
    "    return x\n",
    "\n",
    "tensor_add_3d_py_func = ops.Custom(func=tensor_add_3d,\n",
    "                                   out_shape=infer_shape_py,\n",
    "                                   out_dtype=infer_dtype_py,\n",
    "                                   func_type=\"pyfunc\")\n",
    "\n",
    "result_pyfunc = tensor_add_3d_py_func(input_tensor_x, input_tensor_y)\n",
    "print(result_pyfunc)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1168faad",
   "metadata": {},
   "source": [
    "我们可以得到如下结果，即两个Tensor的和。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "MindSpore",
   "language": "python",
   "name": "mindspore"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}