[{"data":1,"prerenderedAt":846},["ShallowReactive",2],{"content-query-JQN05smIei":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"cover":11,"type":12,"category":13,"body":14,"_type":840,"_id":841,"_source":842,"_file":843,"_stem":844,"_extension":845},"/technology-blogs/en/2598","en",false,"","MindSpore Case Study | Revolutionizing Image Inpainting for Programmers","This blog introduces a novel CRA mechanism, which involves adding contextual aggregated residuals to the upsampled inpainting result, ultimately producing a refined final result.","2023-02-16","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/07/03/6f785d06c7584ce68f7ecd8df09786e2.png","technology-blogs","Practices",{"type":15,"children":16,"toc":837},"root",[17,25,39,44,53,58,68,76,81,86,91,103,115,123,131,143,151,156,164,190,198,205,213,218,226,231,236,243,273,278,285,290,298,303,311,319,337,344,373,384,391,414,422,430,448,455,480,491,499,507,512,517,524,532,537,542,550,558,570,578,586,591,596,603,608,616,621,628,684,692,697,705,713,721,752,760,765,770,778,786,791,799,804,812,817,824,832],{"type":18,"tag":19,"props":20,"children":22},"element","h1",{"id":21},"mindspore-case-study-revolutionizing-image-inpainting-for-programmers",[23],{"type":24,"value":8},"text",{"type":18,"tag":26,"props":27,"children":28},"p",{},[29,31,37],{"type":24,"value":30},"Conventional image inpainting techniques are limited to processing low-resolution input images, and merely upsampling the low-resolution image inpainting results can only yield large and blurry outcomes. As is commonly understood, incorporating high-frequency residuals into a large and blurry image can enhance its details and textures. Building upon this concept, the paper ",{"type":18,"tag":32,"props":33,"children":34},"em",{},[35],{"type":24,"value":36},"Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting",{"type":24,"value":38}," introduces a novel mechanism called Contextual Residual Aggregation (CRA), which involves adding contextual aggregated residuals to the upsampled inpainting result generated by a neural network, ultimately producing a refined final result.",{"type":18,"tag":26,"props":40,"children":41},{},[42],{"type":24,"value":43},"This mechanism employs an Attention Transfer Module (ATM) to compute the aggregated residuals within a mask region using the contextual residuals and attention scores. Additionally, a Generative Adversarial Network (GAN) is established to perform low-resolution image prediction, thereby significantly reducing the memory usage and computing time. This paper presents additional techniques to enhance the quality and speed of inpainting, including attention score sharing, multi-scale attention transfer mechanism, and Lightweight Gated Convolution (LWGC). As a result, the model is capable of accurately inpainting a large image (up to 8K) with an irregular hole size of up to 25% at a high level of precision.",{"type":18,"tag":26,"props":45,"children":46},{},[47],{"type":18,"tag":48,"props":49,"children":50},"strong",{},[51],{"type":24,"value":52},"Environment Configuration",{"type":18,"tag":26,"props":54,"children":55},{},[56],{"type":24,"value":57},"In this tutorial, we run the experiment in graph mode in a GPU environment.",{"type":18,"tag":59,"props":60,"children":62},"pre",{"code":61},"from mindspore import context\n\n# Select the graph execution mode and specify the training platform to GPU. If the Ascend platform is required, replace GPU with Ascend.\ncontext.set_context(mode=context.GRAPH_MODE, device_target='GPU')\n",[63],{"type":18,"tag":64,"props":65,"children":66},"code",{"__ignoreMap":7},[67],{"type":24,"value":61},{"type":18,"tag":26,"props":69,"children":70},{},[71],{"type":18,"tag":48,"props":72,"children":73},{},[74],{"type":24,"value":75},"Data Preparation",{"type":18,"tag":26,"props":77,"children":78},{},[79],{"type":24,"value":80},"We use the Places2 dataset with high-resolution images as the training dataset, which can be downloaded from the official website. The dataset contains more than 1.8 million 1024 x 1024 images, covering 443 classes of scenes.",{"type":18,"tag":26,"props":82,"children":83},{},[84],{"type":24,"value":85},"The mask dataset consists of 100 images of masks. To dynamically generate irregular masks, you can simulate tears, scratches, and spots, or randomly manipulate the shape templates of real objects.",{"type":18,"tag":26,"props":87,"children":88},{},[89],{"type":24,"value":90},"The inference data includes two groups of matched images and masks.",{"type":18,"tag":26,"props":92,"children":93},{},[94,96,101],{"type":24,"value":95},"The training data contains 16 images and is stored in the ",{"type":18,"tag":48,"props":97,"children":98},{},[99],{"type":24,"value":100},"/examples",{"type":24,"value":102}," directory for the CRA.ipynb test.",{"type":18,"tag":26,"props":104,"children":105},{},[106,108,113],{"type":24,"value":107},"Save the decompressed datasets to the ",{"type":18,"tag":48,"props":109,"children":110},{},[111],{"type":24,"value":112},"CRA",{"type":24,"value":114}," directory, whose structure is as follows.",{"type":18,"tag":26,"props":116,"children":117},{},[118],{"type":18,"tag":119,"props":120,"children":122},"img",{"alt":7,"src":121},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/07/03/b546fe401d544c6093f23928fc30fa4c.png",[],{"type":18,"tag":26,"props":124,"children":125},{},[126],{"type":18,"tag":48,"props":127,"children":128},{},[129],{"type":24,"value":130},"Data Processing",{"type":18,"tag":26,"props":132,"children":133},{},[134,136,141],{"type":24,"value":135},"Places2 dataset: Define the ",{"type":18,"tag":48,"props":137,"children":138},{},[139],{"type":24,"value":140},"InpaintDataset()",{"type":24,"value":142}," class to read data, and randomly crop the images to 512 x 512 for normalization.",{"type":18,"tag":59,"props":144,"children":146},{"code":145},"import os\nimport cv2\n\n\nclass InpaintDataset():\n    \"\"\"Process image dataset\"\"\"\n\n    def __init__(self, args):\n        self.args = args\n        self.imglist = self.get_files('./examples')\n\n    def get_files(self, path):\n        ret = []\n        for tuple_path in os.walk(path):\n            for filespath in tuple_path[2]:\n                ret.append(os.path.join(tuple_path[0], filespath))\n        return ret\n\n    def __len__(self):\n        return len(self.imglist)\n\n    def __getitem__(self, index):\n        img = cv2.imread(self.imglist[index])\n        h, w = self.args.IMG_SHAPE[0], self.args.IMG_SHAPE[1]\n        img = cv2.resize(img, (h, w))\n        img = img / 127.5 - 1\n        img = img.transpose((2, 0, 1))\n        return img\n",[147],{"type":18,"tag":64,"props":148,"children":149},{"__ignoreMap":7},[150],{"type":24,"value":145},{"type":18,"tag":26,"props":152,"children":153},{},[154],{"type":24,"value":155},"Mask dataset: Randomly select masks from the dataset, perform a series of data augmentation operations such as random horizontal flipping, rotation by a random angle, and random resizing by 0.8 to 1.0 times, and output a mask tensor with the size of [1, 1, 512, 512].",{"type":18,"tag":59,"props":157,"children":159},{"code":158},"import random\n\nimport mindspore\nimport mindspore.ops as ops\nimport mindspore.dataset as ds\nfrom mindspore import Tensor\n\nfrom src.process_dataset.mask import get_files, read_masks, random_rotate_image, random_resize_image\n\n\ndef random_mask(args):\n    \"\"\"Process mask dataset\"\"\"\n\n    img_shape = args.IMG_SHAPE\n    height = img_shape[0]\n    width = img_shape[1]\n    path_list, n_masks = get_files('./mask_templates')\n    nd = random.randint(0, n_masks - 1)\n    path_mask = path_list[nd]\n    mask = read_masks(path_mask)\n    mask = ds.vision.c_transforms.RandomHorizontalFlip(prob=0.5)(mask)\n    scale = random.uniform(0.8, 1.0)\n    mask = random_rotate_image(mask)\n    mask = random_resize_image(mask, scale, height, width)\n    crop = ds.vision.c_transforms.CenterCrop((height, width))\n    mask1 = crop(mask)\n    mask_show = mask1\n    mask2 = Tensor.from_numpy(mask1)\n    mask3 = mask2.astype(mindspore.float32)\n    mask4 = mask3[:, :, 0:1]\n    mask5 = ops.ExpandDims()(mask4, 0)\n    mask6 = ops.Mul()(1 / 255, mask5)\n    mask = ops.Reshape()(mask6, (1, height, width, 1))\n    mask = ops.Transpose()(mask, (0, 3, 1, 2))\n    return mask, mask_show\n",[160],{"type":18,"tag":64,"props":161,"children":162},{"__ignoreMap":7},[163],{"type":24,"value":158},{"type":18,"tag":26,"props":165,"children":166},{},[167,169,174,176,181,183,188],{"type":24,"value":168},"Call ",{"type":18,"tag":48,"props":170,"children":171},{},[172],{"type":24,"value":173},"InpaintDataset",{"type":24,"value":175}," and ",{"type":18,"tag":48,"props":177,"children":178},{},[179],{"type":24,"value":180},"GeneratorDataset",{"type":24,"value":182}," to read the datasets, use ",{"type":18,"tag":48,"props":184,"children":185},{},[186],{"type":24,"value":187},"create_dict_iterator",{"type":24,"value":189}," to create a dataset iterator, and visualize the input images, masks, and images to be inpainted. Some training data is displayed as follows:",{"type":18,"tag":59,"props":191,"children":193},{"code":192},"import numpy as np\nimport matplotlib.pyplot as plt\n\nfrom src.config.config import cra_config as config\n\n\ndataset_generator = InpaintDataset(config)\ndataset = ds.GeneratorDataset(dataset_generator, ['image'])\ndataset_size = len(dataset_generator)\ntotal_batch = dataset_size // config.train_batchsize\ndataset = dataset.batch(config.train_batchsize, drop_remainder=True)\ndataset = dataset.create_dict_iterator(output_numpy=True)\ndataset = next(dataset)\nfor i, image in enumerate(dataset['image']):\n    image = image[(2, 1, 0), :, :]\n    image = image.transpose(1, 2, 0)\n    mask, mask_show = random_mask(config)\n    mask = ops.Squeeze(0)(mask).asnumpy()\n    mask = mask.transpose(1, 2, 0)\n    real = image * (1-mask)\n    result = np.concatenate([image, mask_show, real], 1)\n    plt.subplot(8, 1, i+1)\n    plt.axis('off')\n    plt.imshow(result)\nplt.show()\n",[194],{"type":18,"tag":64,"props":195,"children":196},{"__ignoreMap":7},[197],{"type":24,"value":192},{"type":18,"tag":26,"props":199,"children":200},{},[201],{"type":18,"tag":119,"props":202,"children":204},{"alt":7,"src":203},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/07/03/c2f7b1f060344064bde46df6fb031838.png",[],{"type":18,"tag":26,"props":206,"children":207},{},[208],{"type":18,"tag":48,"props":209,"children":210},{},[211],{"type":24,"value":212},"Model Architecture",{"type":18,"tag":26,"props":214,"children":215},{},[216],{"type":24,"value":217},"After the data is loaded, we start to build the overall network model. Specifically, we utilize a GAN to predict the inpainting result of a low-resolution image; upsample the result to generate a blurry image of the same size as the original image; generate the high-frequency information of missing content by aggregating weighted high-frequency residuals of contextual patches; and finally add the aggregated residuals to the large, and blurry image to obtain a clear and complete inpainted image. Next, we will present a comprehensive overview of the network architecture, starting from its individual components and gradually building up to the complete system.",{"type":18,"tag":26,"props":219,"children":220},{},[221],{"type":18,"tag":48,"props":222,"children":223},{},[224],{"type":24,"value":225},"LWGC",{"type":18,"tag":26,"props":227,"children":228},{},[229],{"type":24,"value":230},"After a comprehensive analysis of the limitations of common and partial convolutions in handling irregular hole regions, this paper employs Gated Convolution (GC) to construct convolutional layers for the model. This approach doubles the number of parameters and the processing time compared with common convolutions. Three modified versions of LWGC are proposed here, that is, depth-separable LWGC (LWGCds), pixelwise LWGC (LWGCpw), and single-channel LWGC (LWGCsc).",{"type":18,"tag":26,"props":232,"children":233},{},[234],{"type":24,"value":235},"The output of the original GC can be expressed as follows:",{"type":18,"tag":26,"props":237,"children":238},{},[239],{"type":18,"tag":119,"props":240,"children":242},{"alt":7,"src":241},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/07/03/8714591084f04617acba1ac48b73a5c2.png",[],{"type":18,"tag":26,"props":244,"children":245},{},[246,251,253,258,260,265,266,271],{"type":18,"tag":32,"props":247,"children":248},{},[249],{"type":24,"value":250},"σ",{"type":24,"value":252}," is the Sigmoid function. ",{"type":18,"tag":32,"props":254,"children":255},{},[256],{"type":24,"value":257},"ψ",{"type":24,"value":259}," is an activation function that is often set to ELU. ",{"type":18,"tag":32,"props":261,"children":262},{},[263],{"type":24,"value":264},"Wg",{"type":24,"value":175},{"type":18,"tag":32,"props":267,"children":268},{},[269],{"type":24,"value":270},"Wf",{"type":24,"value":272}," are two different sets of convolutional filters.",{"type":18,"tag":26,"props":274,"children":275},{},[276],{"type":24,"value":277},"The three variants of LWGC differ in the computation of the gate branch G.",{"type":18,"tag":26,"props":279,"children":280},{},[281],{"type":18,"tag":119,"props":282,"children":284},{"alt":7,"src":283},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/07/03/4b8c9c0a9bd449ee9122aee8db7412cf.png",[],{"type":18,"tag":26,"props":286,"children":287},{},[288],{"type":24,"value":289},"Here, we use LWGCsc on the generator's coarse network.",{"type":18,"tag":59,"props":291,"children":293},{"code":292},"import mindspore.nn as nn\nfrom mindspore.common.initializer import TruncatedNormal\n\n\nclass ScConv(nn.Cell):\n    \"\"\"Build LWGCsc Gate branch\"\"\"\n\n    def __init__(self, in_channel, kernel_size, stride, padding, dilation):\n        super(ScConv, self).__init__()\n        self.single_channel_conv = nn.Conv2d(in_channels=in_channel, out_channels=1, kernel_size=kernel_size,\n                                             stride=stride, pad_mode='same', padding=padding, dilation=dilation,\n                                             group=1, has_bias=True, weight_init=TruncatedNormal(0.05))\n\n    def construct(self, x):\n        x = self.single_channel_conv(x)\n        return x\n",[294],{"type":18,"tag":64,"props":295,"children":296},{"__ignoreMap":7},[297],{"type":24,"value":292},{"type":18,"tag":26,"props":299,"children":300},{},[301],{"type":24,"value":302},"Build gated convolutional layers with the nn.Conv2d common convolution.",{"type":18,"tag":59,"props":304,"children":306},{"code":305},"class GatedConv2d(nn.Cell):\n    \"\"\"Build LWGCsc and LWGCds network layer\"\"\"\n\n    def __init__(self, in_channel, out_channel, kernel_size, stride, dilation, sc=False):\n        super(GatedConv2d, self).__init__()\n        self.activation = nn.ELU(alpha=1.0)\n        if sc:\n            self.conv2d = nn.Conv2d(in_channel, out_channel, kernel_size, stride, pad_mode='same', padding=0,\n                                    dilation=dilation, has_bias=True, weight_init=TruncatedNormal(0.05))\n            self.gate_factor = ScConv(in_channel, kernel_size, stride, 0, dilation)\n        else:\n            self.conv2d = nn.Conv2d(in_channel, out_channel, kernel_size, stride, pad_mode='same', padding=0,\n                                    dilation=dilation, has_bias=True, weight_init=TruncatedNormal(0.05))\n            self.gate_factor = DepthSeparableConv(in_channel, out_channel, stride, dilation)\n        self.sigmoid = nn.Sigmoid()\n\n    def construct(self, x):\n        gc_f = self.conv2d(x)\n        gc_g = self.gate_factor(x)\n        x = self.sigmoid(gc_g) * self.activation(gc_f)\n        return x\n",[307],{"type":18,"tag":64,"props":308,"children":309},{"__ignoreMap":7},[310],{"type":24,"value":305},{"type":18,"tag":26,"props":312,"children":313},{},[314],{"type":18,"tag":48,"props":315,"children":316},{},[317],{"type":24,"value":318},"Attention Computing Module (ACM)",{"type":18,"tag":26,"props":320,"children":321},{},[322,324,329,331,335],{"type":24,"value":323},"The attention score is computed based on the region affinity of a high-level feature map (denoted as ",{"type":18,"tag":32,"props":325,"children":326},{},[327],{"type":24,"value":328},"P",{"type":24,"value":330},"). ",{"type":18,"tag":32,"props":332,"children":333},{},[334],{"type":24,"value":328},{"type":24,"value":336}," is divided into patches of a specific size and the ACM computes the cosine similarity between the patches inside and outside missing regions. The formula is as follows:",{"type":18,"tag":26,"props":338,"children":339},{},[340],{"type":18,"tag":119,"props":341,"children":343},{"alt":7,"src":342},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/07/03/65906e4a7c00451686cde19216580506.png",[],{"type":18,"tag":26,"props":345,"children":346},{},[347,352,354,358,360,365,367,371],{"type":18,"tag":32,"props":348,"children":349},{},[350],{"type":24,"value":351},"pi",{"type":24,"value":353}," is the _i_th patch extracted outside the hole region in ",{"type":18,"tag":32,"props":355,"children":356},{},[357],{"type":24,"value":328},{"type":24,"value":359},", and ",{"type":18,"tag":32,"props":361,"children":362},{},[363],{"type":24,"value":364},"pj",{"type":24,"value":366}," is the _j_th patch extracted inside the hole region in ",{"type":18,"tag":32,"props":368,"children":369},{},[370],{"type":24,"value":328},{"type":24,"value":372},".",{"type":18,"tag":26,"props":374,"children":375},{},[376,378,382],{"type":24,"value":377},"Apply Softmax on the similarity score to obtain the attention score of each patch in ",{"type":18,"tag":32,"props":379,"children":380},{},[381],{"type":24,"value":328},{"type":24,"value":383},":",{"type":18,"tag":26,"props":385,"children":386},{},[387],{"type":18,"tag":119,"props":388,"children":390},{"alt":7,"src":389},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/07/03/4c2277e74e8848baa113e84e1455377b.png",[],{"type":18,"tag":26,"props":392,"children":393},{},[394,399,401,405,407,412],{"type":18,"tag":32,"props":395,"children":396},{},[397],{"type":24,"value":398},"N",{"type":24,"value":400}," is the number of patches outside the hole region in ",{"type":18,"tag":32,"props":402,"children":403},{},[404],{"type":24,"value":328},{"type":24,"value":406},". Our framework utilizes a 64 x 64 high-level feature map to compute the attention score. The patch size used for this computation is 3 x 3, and the resulting score is stored in the ",{"type":18,"tag":48,"props":408,"children":409},{},[410],{"type":24,"value":411},"correspondence",{"type":24,"value":413}," tensor.",{"type":18,"tag":59,"props":415,"children":417},{"code":416},"from src.models.compute_attention import downsample, InitConv2d\n\n\nclass ContextualAttention(nn.Cell):\n    \"\"\"\n    Attention score computing module.\n\n    Args:\n        softmax_scale(int): scaled softmax for attention.\n        src(Tensor): input feature to match (foreground).\n        ref(Tensor): input feature for match (background).\n        mask(Tensor): input mask for ref, indicating patches not available.\n\n    Return:\n        out: Foreground area filled with context information\n             (It generally refers to the 64 * 64 feature map used to calculate attention scores).\n        correspondence: Attention score.\n    \"\"\"\n\n    def __init__(self, softmax_scale=10, fuse=True, dtype=mindspore.float32):\n        super(ContextualAttention, self).__init__()\n        self.softmax_scale = softmax_scale\n        self.fuse = fuse\n        self.dtype = dtype\n        self.reducesum = ops.ReduceSum(False)\n        self.unfold1 = nn.Unfold([1, 3, 3, 1], [1, 2, 2, 1], [1, 1, 1, 1], 'same')\n        self.unfold2 = nn.Unfold([1, 3, 3, 1], [1, 1, 1, 1], [1, 1, 1, 1], 'same')\n        self.transpose = ops.Transpose()\n        self.reshape = ops.Reshape()\n        self.pool1 = nn.MaxPool2d(16, 16, 'same', 'NCHW')\n        self.pool2 = nn.MaxPool2d(3, 1, 'same', 'NCHW')\n        self.maximum = ops.Maximum()\n        self.sqrt = ops.Sqrt()\n        self.square = ops.Square()\n        self.eye = ops.Eye()\n        self.reducemax = ops.ReduceMax(True)\n        self.greaterequal = ops.GreaterEqual()\n        self.pow = ops.Pow()\n        self.div = ops.Div()\n        self.softmax = nn.Softmax(1)\n        self.cat = ops.Concat(0)\n        self.conv1 = InitConv2d([3, 3, 128, 1024], 1, True)\n        self.conv2 = InitConv2d([3, 3, 1, 1], 1, True)\n        self.disconv1 = InitConv2d([3, 3, 128, 1024], 2, False)\n\n    def construct(self, src, ref, mask, method='SOFT'):\n        \"\"\"compute attention score\"\"\"\n\n        # get shapes\n        shape_src = src.shape\n        batch_size = shape_src[0]\n        nc = shape_src[1]\n        # raw features\n        raw_feats = self.unfold1(ref)\n        raw_feats = self.transpose(raw_feats, (0, 2, 3, 1))\n        raw_feats = self.reshape(raw_feats, (batch_size, -1, 3, 3, nc))\n        raw_feats = self.transpose(raw_feats, (0, 2, 3, 4, 1))\n        split = ops.Split(0, batch_size)\n        raw_feats_lst = split(raw_feats)\n        # resize\n        src = downsample(src)\n        ref = downsample(ref)\n        ss = src.shape\n        rs = ref.shape\n        src_lst = split(src)\n        feats = self.unfold2(ref)\n        feats = self.transpose(feats, (0, 2, 3, 1))\n        feats = self.reshape(feats, (batch_size, -1, 3, 3, nc))\n        feats = self.transpose(feats, (0, 2, 3, 4, 1))\n        feats_lst = split(feats)\n        # process mask\n        mask = self.pool1(mask)\n        mask = self.pool2(mask)\n        mask = 1 - mask\n        mask = self.reshape(mask, (1, -1, 1, 1))\n\n        y_lst, y_up_lst = [], []\n        offsets = []\n        fuse_weight = self.reshape(self.eye(3, 3, mindspore.float32), (3, 3, 1, 1))\n        for x, r, raw_r in zip(src_lst, feats_lst, raw_feats_lst):\n            r = r[0]\n            r = r / self.maximum(self.sqrt(self.reducesum(self.square(r), [0, 1, 2])), 1e-8)\n            r_kernel = self.transpose(r, (3, 2, 0, 1))\n            y = self.conv1(x, r_kernel)\n            if self.fuse:\n                # conv implementation for fuse scores to encourage large patches\n                yi = self.reshape(y, (1, 1, ss[2] * ss[3], rs[2] * rs[3]))\n                fuse_weight_kernel = ops.Transpose()(fuse_weight, (3, 2, 0, 1))\n                yi = self.conv2(yi, fuse_weight_kernel)\n                yi = self.transpose(yi, (0, 2, 3, 1))\n                yi = self.reshape(yi, (1, ss[2], ss[3], rs[2], rs[3]))\n                yi = self.transpose(yi, (0, 2, 1, 4, 3))\n                yi = self.reshape(yi, (1, ss[2] * ss[3], rs[2] * rs[3], 1))\n                yi = self.transpose(yi, (0, 3, 1, 2))\n                yi = self.conv2(yi, fuse_weight_kernel)\n                yi = self.transpose(yi, (0, 2, 3, 1))\n                yi = self.reshape(yi, (1, ss[3], ss[2], rs[3], rs[2]))\n                yi = self.transpose(yi, (0, 2, 1, 4, 3))\n                y = yi\n            y = self.reshape(y, (1, ss[2], ss[3], rs[2] * rs[3]))\n            y = self.transpose(y, (0, 3, 1, 2))\n            if method == 'HARD':\n                ym = self.reducemax(y, 1)\n                y = y * mask\n                coef = self.greaterequal(y, max(y, 1)).astype(self.dtype)\n                y = self.pow(coef * self.div(y, ym + 1e-04), 2)\n            elif method == 'SOFT':\n                y = (self.softmax(y * mask * self.softmax_scale)) * mask\n            y = self.reshape(y, (1, rs[2] * rs[3], ss[2], ss[3]))\n            if self.dtype == mindspore.float32:\n                offset = y.argmax(1)\n                offsets.append(offset)\n            feats = raw_r[0]\n            feats_kernel = self.transpose(feats, (3, 2, 0, 1))\n            y_up = self.disconv1(y, feats_kernel)\n            y_lst.append(y)\n            y_up_lst.append(y_up)\n        out, correspondence = self.cat(y_up_lst), self.cat(y_lst)\n        out = self.reshape(out, (shape_src[0], shape_src[1], shape_src[2], shape_src[3]))\n        return out, correspondence\n",[418],{"type":18,"tag":64,"props":419,"children":420},{"__ignoreMap":7},[421],{"type":24,"value":416},{"type":18,"tag":26,"props":423,"children":424},{},[425],{"type":18,"tag":48,"props":426,"children":427},{},[428],{"type":24,"value":429},"Multi-Scale Attention Transfer and Score Sharing (ATM)",{"type":18,"tag":26,"props":431,"children":432},{},[433,435,439,441,446],{"type":24,"value":434},"After the attention scores are computed from the high-level feature map ",{"type":18,"tag":32,"props":436,"children":437},{},[438],{"type":24,"value":328},{"type":24,"value":440},", the missing content in the lower-level feature map (",{"type":18,"tag":32,"props":442,"children":443},{},[444],{"type":24,"value":445},"P__L",{"type":24,"value":447},") can be filled with the weighted contextual patches by using the attention scores.",{"type":18,"tag":26,"props":449,"children":450},{},[451],{"type":18,"tag":119,"props":452,"children":454},{"alt":7,"src":453},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/07/03/a4b2da243b0548a1acec78c49ffcd754.png",[],{"type":18,"tag":26,"props":456,"children":457},{},[458,460,465,467,472,474,478],{"type":24,"value":459},"l ∈ 1, 2, 3 corresponds to network layers whose feature map sizes are 64, 128, and 256 respectively. ",{"type":18,"tag":32,"props":461,"children":462},{},[463],{"type":24,"value":464},"P__l__i",{"type":24,"value":466}," is the _i_th patch extracted outside the hole region, and ",{"type":18,"tag":32,"props":468,"children":469},{},[470],{"type":24,"value":471},"P__l__j",{"type":24,"value":473}," is the _j_th patch extracted inside the hole region. ",{"type":18,"tag":32,"props":475,"children":476},{},[477],{"type":24,"value":398},{"type":24,"value":479}," indicates the number of patches divided in the background region. Because the size of a feature map varies from layer to layer, the size of patches divided at each layer should change accordingly.",{"type":18,"tag":26,"props":481,"children":482},{},[483,485,489],{"type":24,"value":484},"In the paper framework, the same set of attention scores (",{"type":18,"tag":48,"props":486,"children":487},{},[488],{"type":24,"value":411},{"type":24,"value":490},") is applied to different feature maps multiple times to implement attention transfer. The sharing of attention scores reduces network parameters and improves computational efficiency.",{"type":18,"tag":59,"props":492,"children":494},{"code":493},"class ApplyAttention(nn.Cell):\n\n    \"\"\"\n\n    Attention transfer module(used for training)\n\n    (It generally used for 128 * 128 / 256 * 256 feature map).\n\n\n    Args:\n\n        shp(list): the shape of input feature map.\n\n        shp_att(list): the shape of attention score.\n\n\n    Return:\n\n        out: Feature map filled by attention transfer module.\n\n    \"\"\"\n\n\n    def __init__(self, shp, shp_att):\n\n        super(ApplyAttention, self).__init__()\n\n        self.shp = shp\n\n        self.shp_att = shp_att\n\n        self.rate = self.shp[2] // self.shp_att[2]\n\n        self.kernel = self.rate * 2\n\n        self.batch_size = self.shp[0]\n\n        self.sz = self.shp[2]\n\n        self.nc = self.shp[1]\n\n        self.unfold = nn.Unfold([1, self.kernel, self.kernel, 1], [1, self.rate, self.rate, 1], [1, 1, 1, 1], 'same')\n\n        self.transpose = ops.Transpose()\n\n        self.reshape = ops.Reshape()\n\n        self.split = ops.Split(0, self.batch_size)\n\n        self.disconv1 = InitConv2d([8, 8, 64, 1024], self.rate, False)\n\n        self.disconv2 = InitConv2d([16, 16, 32, 1024], self.rate, False)\n\n        self.concat = ops.Concat(0)\n\n        self.conv_pl2 = nn.SequentialCell(\n\n            GatedConv2d(64, 64, 3, 1, 1),\n\n            GatedConv2d(64, 64, 3, 1, 2)\n\n        )\n\n        self.conv_pl1 = nn.SequentialCell(\n\n            GatedConv2d(32, 32, 3, 1, 1),\n\n            GatedConv2d(32, 32, 3, 1, 2)\n\n        )\n\n\n    def construct(self, x, correspondence):\n\n        \"\"\"apply attention on training\"\"\"\n\n\n        raw_feats = self.unfold(x)\n\n        raw_feats = self.transpose(raw_feats, (0, 2, 3, 1))\n\n        raw_feats = self.reshape(raw_feats, (self.batch_size, -1, self.kernel, self.kernel, self.nc))\n\n        raw_feats = self.transpose(raw_feats, (0, 2, 3, 4, 1))\n\n        raw_feats_lst = self.split(raw_feats)\n\n        ys = []\n\n        correspondence = self.transpose(correspondence, (0, 2, 3, 1))\n\n        att_lst = self.split(correspondence)\n\n        for feats, att in zip(raw_feats_lst, att_lst):\n\n            feats_kernel = self.transpose(feats[0], (3, 2, 0, 1))\n\n            att = self.transpose(att, (0, 3, 1, 2))\n\n            if self.shp[2] == 128:\n\n                y1 = self.disconv1(att, feats_kernel)\n\n                ys.append(y1)\n\n            elif self.shp[2] == 256:\n\n                y2 = self.disconv2(att, feats_kernel)\n\n                ys.append(y2)\n\n            else:\n\n                print('Value Error')\n\n        out = self.concat(ys)\n\n        if self.shp[2] == 128:\n\n            out = self.conv_pl2(out)\n\n        elif self.shp[2] == 256:\n\n            out = self.conv_pl1(out)\n\n        else:\n\n            print('conv error')\n\n        return out\n",[495],{"type":18,"tag":64,"props":496,"children":497},{"__ignoreMap":7},[498],{"type":24,"value":493},{"type":18,"tag":26,"props":500,"children":501},{},[502],{"type":18,"tag":48,"props":503,"children":504},{},[505],{"type":24,"value":506},"Overall Pipeline of CRA",{"type":18,"tag":26,"props":508,"children":509},{},[510],{"type":24,"value":511},"Take a high-resolution input image and downsample it to 512 x 512 to create a low-resolution image. Then, upsample the low-resolution image to produce a blurry image (low-frequency component) with the same size as the original input. The generator obtains the low-resolution image and inpaints it. At the same time, the ACM of the generator computes the attention score.",{"type":18,"tag":26,"props":513,"children":514},{},[515],{"type":24,"value":516},"To obtain the final inpainting result of the hole region, start by computing the contextual residuals of the image by subtracting the blurry low-frequency component from the original input. Next, compute the aggregated residuals in the hole region using the context residuals and attention scores through the ATM. Finally, add the aggregated residuals to the upsampled image inpainting result. Note that the region outside the hole should still use the original input. The following figure shows the overall process of the CRA mechanism.",{"type":18,"tag":26,"props":518,"children":519},{},[520],{"type":18,"tag":119,"props":521,"children":523},{"alt":7,"src":522},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/07/03/7c7d668711644193a72c0162e6129911.png",[],{"type":18,"tag":26,"props":525,"children":526},{},[527],{"type":18,"tag":48,"props":528,"children":529},{},[530],{"type":24,"value":531},"Generator",{"type":18,"tag":26,"props":533,"children":534},{},[535],{"type":24,"value":536},"The generator adopts a two-stage coarse-to-fine network architecture, in which the coarse network generates a rough effect of image inpainting, and the fine network predicts a finer result. The generator takes the original image and mask as inputs to generate a complete inpainted image. The input and output sizes are 512 x 512. To expand the perceptive fields and reduce computation, inputs are downsampled to 256 x 256 before convolution on the coarse network. For inputs to the fine network, the input hole region is replaced with the corresponding region on the coarse network.",{"type":18,"tag":26,"props":538,"children":539},{},[540],{"type":24,"value":541},"The fine network uses a high-level feature map to compute the contextual attention score and performs attention transfer on multiple lower-level feature maps. The paper also uses dilated convolutions in both the coarse and fine networks to further expand the size of the perceptive fields. In addition, to improve computational efficiency, LWGC is applied to all layers of the generator. Batch Normalization (BN) processing has been uniformly removed from the network convolutional layers, padding processing is done using the 'same' mode, and the activation functions for convolutional layers are all ELU.",{"type":18,"tag":59,"props":543,"children":545},{"code":544},"from src.models.network_module import GatedConv2d, TransposeGatedConv2d\nfrom src.models.compute_attention import ContextualAttention, ApplyAttention\n\n\nclass Coarse(nn.Cell):\n    \"\"\"Build the first stage of generator: coarse network\"\"\"\n\n    def __init__(self):\n        super(Coarse, self).__init__()\n        self.coarse1 = nn.SequentialCell(\n            GatedConv2d(4, 32, 5, 2, 1, sc=True),\n            GatedConv2d(32, 32, 3, 1, 1, sc=True),\n            GatedConv2d(32, 64, 3, 2, 1, sc=True)\n        )\n        self.coarse2 = nn.SequentialCell(\n            GatedConv2d(64, 64, 3, 1, 1, sc=True),\n            GatedConv2d(64, 64, 3, 1, 1, sc=True),\n            GatedConv2d(64, 64, 3, 1, 1, sc=True)\n        )\n        self.coarse3 = nn.SequentialCell(\n            GatedConv2d(64, 64, 3, 1, 1, sc=True),\n            GatedConv2d(64, 64, 3, 1, 1, sc=True),\n            GatedConv2d(64, 64, 3, 1, 1, sc=True)\n        )\n        self.coarse4 = nn.SequentialCell(\n            GatedConv2d(64, 64, 3, 1, 2, sc=True),\n            GatedConv2d(64, 64, 3, 1, 2, sc=True),\n            GatedConv2d(64, 64, 3, 1, 2, sc=True),\n            GatedConv2d(64, 64, 3, 1, 2, sc=True),\n            GatedConv2d(64, 64, 3, 1, 2, sc=True)\n        )\n        self.coarse5 = nn.SequentialCell(\n            GatedConv2d(64, 64, 3, 1, 4, sc=True),\n            GatedConv2d(64, 64, 3, 1, 4, sc=True),\n            GatedConv2d(64, 64, 3, 1, 4, sc=True),\n            GatedConv2d(64, 64, 3, 1, 4, sc=True)\n        )\n        self.coarse6 = nn.SequentialCell(\n            GatedConv2d(64, 64, 3, 1, 8, sc=True),\n            GatedConv2d(64, 64, 3, 1, 8, sc=True),\n        )\n        self.coarse7 = nn.SequentialCell(\n            GatedConv2d(64, 64, 3, 1, 1, sc=True),\n            GatedConv2d(64, 64, 3, 1, 1, sc=True),\n            GatedConv2d(64, 64, 3, 1, 1, sc=True),\n        )\n        self.coarse8 = nn.SequentialCell(\n            TransposeGatedConv2d(64, 32, 3, 1, 1, sc=True),\n            GatedConv2d(32, 32, 3, 1, 1, sc=True),\n            TransposeGatedConv2d(32, 3, 3, 1, 1, sc=True),\n        )\n\n    def construct(self, first_in):\n        first_out = self.coarse1(first_in)\n        first_out = self.coarse2(first_out)\n        first_out = self.coarse3(first_out)\n        first_out = self.coarse4(first_out)\n        first_out = self.coarse5(first_out)\n        first_out = self.coarse6(first_out)\n        first_out = self.coarse7(first_out)\n        first_out = self.coarse8(first_out)\n        first_out = ops.clip_by_value(first_out, -1, 1)\n        return first_out\n\n\nclass GatedGenerator(nn.Cell):\n    \"\"\"\n    Build the second stage of generator: refine network and complete generator.\n\n    Args:\n        opt(class): option class.\n\n    Return:\n        first_out: The output of coarse network.\n        second_out: The output of refine network.\n        match: Attention score.\n    \"\"\"\n\n    def __init__(self, opt):\n        super(GatedGenerator, self).__init__()\n        self.coarse = Coarse()\n        self.refinement1 = nn.SequentialCell(\n            GatedConv2d(4, 32, 3, 2, 1),\n            GatedConv2d(32, 32, 3, 1, 1)\n        )\n        self.refinement2 = nn.SequentialCell(\n            GatedConv2d(32, 64, 3, 2, 1),\n            GatedConv2d(64, 64, 3, 1, 1)\n        )\n        self.refinement3 = nn.SequentialCell(\n            GatedConv2d(64, 128, 3, 2, 1),\n            GatedConv2d(128, 128, 3, 1, 1)\n        )\n        self.refinement4 = GatedConv2d(128, 128, 3, 1, 1)\n        self.refinement5 = nn.SequentialCell(\n            GatedConv2d(128, 128, 3, 1, 2),\n            GatedConv2d(128, 128, 3, 1, 4)\n        )\n        self.refinement6 = nn.SequentialCell(\n            GatedConv2d(128, 128, 3, 1, 8),\n            GatedConv2d(128, 128, 3, 1, 16)\n        )\n        self.refinement7 = nn.SequentialCell(\n            TransposeGatedConv2d(128, 64, 3, 1, 1),\n            GatedConv2d(64, 64, 3, 1, 1)\n        )\n        self.refinement8 = nn.SequentialCell(\n            TransposeGatedConv2d(128, 32, 3, 1, 1),\n            GatedConv2d(32, 32, 3, 1, 1)\n        )\n        self.refinement9 = TransposeGatedConv2d(64, 3, 3, 1, 1)\n        self.conv_att1 = GatedConv2d(128, 128, 3, 1, 1)\n        self.conv_att2 = GatedConv2d(256, 128, 3, 1, 1)\n        self.batch = opt.train_batchsize\n        self.apply_attention1 = ApplyAttention([self.batch, 64, 128, 128], [self.batch, 1024, 32, 32])\n        self.apply_attention2 = ApplyAttention([self.batch, 32, 256, 256], [self.batch, 1024, 32, 32])\n        self.ones = ops.Ones()\n        self.concat = ops.Concat(1)\n        self.bilinear_256 = ops.ResizeBilinear((256, 256))\n        self.bilinear_512 = ops.ResizeBilinear((512, 512))\n        self.reshape = ops.Reshape()\n        self.contextual_attention = ContextualAttention(fuse=True, dtype=mindspore.float32)\n        self.cat = ops.Concat(1)\n        self.method = opt.attention_type\n\n    def construct(self, img, mask):\n        x_in = img.astype(mindspore.float32)\n        shape = x_in.shape\n        mask_batch = self.ones((shape[0], 1, shape[2], shape[3]), mindspore.float32)\n        mask_batch = mask_batch * mask\n        first_in = self.concat((x_in, mask_batch))\n        first_in = self.bilinear_256(first_in)\n        first_out = self.coarse(first_in)\n        first_out = self.bilinear_512(first_out)\n        first_out = self.reshape(first_out, (shape[0], shape[1], shape[2], shape[3]))\n        x_coarse = first_out * mask_batch + x_in * (1. - mask_batch)\n        second_in = self.concat([x_coarse, mask_batch])\n        pl1 = self.refinement1(second_in)\n        pl2 = self.refinement2(pl1)\n        second_out = self.refinement3(pl2)\n        second_out = self.refinement4(second_out)\n        second_out = self.refinement5(second_out)\n        pl3 = self.refinement6(second_out)\n        x_hallu = pl3\n        x, match = self.contextual_attention(pl3, pl3, mask, self.method)\n        x = self.conv_att1(x)\n        x = self.cat((x_hallu, x))\n        second_out = self.conv_att2(x)\n        second_out = self.refinement7(second_out)\n        second_out_att = self.apply_attention1(pl2, match)\n        second_out = self.concat([second_out_att, second_out])\n        second_out = self.refinement8(second_out)\n        second_out_att = self.apply_attention2(pl1, match)\n        second_out = self.concat([second_out_att, second_out])\n        second_out = self.refinement9(second_out)\n        second_out = ops.clip_by_value(second_out, -1, 1)\n        return first_out, second_out, match\n",[546],{"type":18,"tag":64,"props":547,"children":548},{"__ignoreMap":7},[549],{"type":24,"value":544},{"type":18,"tag":26,"props":551,"children":552},{},[553],{"type":18,"tag":48,"props":554,"children":555},{},[556],{"type":24,"value":557},"Discriminator",{"type":18,"tag":26,"props":559,"children":560},{},[561,563,568],{"type":24,"value":562},"Discriminator ",{"type":18,"tag":32,"props":564,"children":565},{},[566],{"type":24,"value":567},"D",{"type":24,"value":569}," uses a series of Conv2d and LeakyReLU layers for processing, and finally outputs the final discrimination result through the nn.Dense function. The code implementation of the discriminator is as follows:",{"type":18,"tag":59,"props":571,"children":573},{"code":572},"from src.models.network_module import Conv2dLayer\n\n\nclass Discriminator(nn.Cell):\n    \"\"\"Build the complete discriminator\"\"\"\n\n    def __init__(self):\n        super(Discriminator, self).__init__()\n        self.block1 = Conv2dLayer(3, 64, 5, 2, 1)\n        self.block2 = Conv2dLayer(64, 128, 5, 2, 1)\n        self.block3 = Conv2dLayer(128, 256, 5, 2, 1)\n        self.block4 = Conv2dLayer(256, 256, 5, 2, 1)\n        self.block5 = Conv2dLayer(256, 256, 5, 2, 1)\n        self.block6 = Conv2dLayer(256, 256, 5, 2, 1)\n        self.block7 = nn.Dense(16384, 1)\n\n    def construct(self, img):\n        x = img\n        x = self.block1(x)\n        x = self.block2(x)\n        x = self.block3(x)\n        x = self.block4(x)\n        x = self.block5(x)\n        x = self.block6(x)\n        x = x.reshape([x.shape[0], -1])\n        x = self.block7(x)\n        return x\n",[574],{"type":18,"tag":64,"props":575,"children":576},{"__ignoreMap":7},[577],{"type":24,"value":572},{"type":18,"tag":26,"props":579,"children":580},{},[581],{"type":18,"tag":48,"props":582,"children":583},{},[584],{"type":24,"value":585},"Connecting Loss Functions to the Network",{"type":18,"tag":26,"props":587,"children":588},{},[589],{"type":24,"value":590},"MindSpore encapsulates operations such as loss functions and optimizers into a cell, which can pose challenges when implementing GANs. This is because the structure of GANs differs from that of general classification networks, and their losses are multi-output, consisting of both discriminator and generator losses. If the cell package is directly used, the framework cannot establish the connection between the losses and the network, making training impossible. Therefore, we need to customize the WithLossCell class to connect the losses to the network.",{"type":18,"tag":26,"props":592,"children":593},{},[594],{"type":24,"value":595},"For generator losses, we create adversarial loss Ladv and reconstruction loss Lrec, respectively.",{"type":18,"tag":26,"props":597,"children":598},{},[599],{"type":18,"tag":119,"props":600,"children":602},{"alt":7,"src":601},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/07/03/d6cafef8b19847a9b88bcf50f7f5d2e6.png",[],{"type":18,"tag":26,"props":604,"children":605},{},[606],{"type":24,"value":607},"Generally, α, α1, and α2 are set to 1.2, and β is set to 0.001.",{"type":18,"tag":59,"props":609,"children":611},{"code":610},"from src.models.cra_utils.utils import gan_wgan_loss\n\n\nclass GenWithLossCell(nn.Cell):\n    \"\"\"\n    Build the generator loss.\n\n    Args:\n        net_g(cell): generator network.\n        net_d(cell): discriminator network.\n        args(class): option class.\n        auto_prefix(bool): whether to automatically generate namespace for cell and its subcells.\n            If set to True, the network parameter name will be prefixed, otherwise it will not.\n\n    Return:\n        loss_g: the loss of generator.\n    \"\"\"\n\n    def __init__(self, net_g, net_d, args, auto_prefix=True):\n        super(GenWithLossCell, self).__init__(auto_prefix=auto_prefix)\n        self.net_g = net_g\n        self.net_d = net_d\n        self.gan_wgan_loss = gan_wgan_loss\n        self.coarse_alpha = args.coarse_alpha\n        self.gan_with_mask = args.gan_with_mask\n        self.gan_loss_alpha = args.gan_loss_alpha\n        self.in_hole_alpha = args.in_hole_alpha\n        self.context_alpha = args.context_alpha\n        self.train_batchsize = args.train_batchsize\n        self.mean = ops.ReduceMean(False)\n        self.abs = ops.Abs()\n        self.concat_0 = ops.Concat(0)\n        self.concat_1 = ops.Concat(1)\n        self.split = ops.Split(0, 2)\n        self.tile = ops.Tile()\n\n    def construct(self, real, x, mask):\n        x1, x2, _ = self.net_g(x, mask)\n        fake = x2\n        losses = {}\n        fake_patched = fake * mask + real * (1 - mask)\n        fake_patched = fake_patched.astype(mindspore.float32)\n        losses['in_hole_loss'] = self.coarse_alpha * self.mean(self.abs(real - x1) * mask)\n        losses['in_hole_loss'] = losses['in_hole_loss'] + self.mean(self.abs(real - x2) * mask)\n        losses['context_loss'] = self.coarse_alpha * self.mean(self.abs(real - x1) * (1 - mask))\n        losses['context_loss'] = losses['context_loss'] + self.mean(self.abs(real - x2) * (1 - mask))\n        losses['context_loss'] = losses['context_loss'] / self.mean(1 - mask)\n        real_fake = self.concat_0((real, fake_patched))\n        if self.gan_with_mask:\n            real_fake = self.concat_1((real_fake, self.tile(mask, (self.train_batchsize * 2, 1, 1, 1))))\n        d_real_fake = self.net_d(real_fake)\n        d_real, d_fake = self.split(d_real_fake)\n        g_loss, _ = self.gan_wgan_loss(d_real, d_fake)\n        losses['adv_gloss'] = g_loss\n        losses['g_loss'] = self.gan_loss_alpha * losses['adv_gloss']\n        losses['g_loss'] = losses['g_loss'] + self.in_hole_alpha * losses['in_hole_loss']\n        losses['g_loss'] = losses['g_loss'] + self.context_alpha * losses['context_loss']\n        loss_g = losses['g_loss']\n        return loss_g\n",[612],{"type":18,"tag":64,"props":613,"children":614},{"__ignoreMap":7},[615],{"type":24,"value":610},{"type":18,"tag":26,"props":617,"children":618},{},[619],{"type":24,"value":620},"For discriminator losses, we add the WGAN-GP loss to enhance the global consistency of the refined network in the second phase.",{"type":18,"tag":26,"props":622,"children":623},{},[624],{"type":18,"tag":119,"props":625,"children":627},{"alt":7,"src":626},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/07/03/0c629633f1284ad88c6ccebb9ce78394.png",[],{"type":18,"tag":26,"props":629,"children":630},{},[631,636,638,643,645,650,652,657,658,663,665,670,671,676,677,682],{"type":18,"tag":32,"props":632,"children":633},{},[634],{"type":24,"value":635},"D(.)",{"type":24,"value":637}," is the discriminator output and ",{"type":18,"tag":32,"props":639,"children":640},{},[641],{"type":24,"value":642},"G(.)",{"type":24,"value":644}," is the generator output. ",{"type":18,"tag":32,"props":646,"children":647},{},[648],{"type":24,"value":649},"x",{"type":24,"value":651},", ",{"type":18,"tag":32,"props":653,"children":654},{},[655],{"type":24,"value":656},"˜x",{"type":24,"value":651},{"type":18,"tag":32,"props":659,"children":660},{},[661],{"type":24,"value":662},"ˆx",{"type":24,"value":664},", are the real image, generated image, and interpolations between them, respectively. ",{"type":18,"tag":32,"props":666,"children":667},{},[668],{"type":24,"value":669},"P__g",{"type":24,"value":651},{"type":18,"tag":32,"props":672,"children":673},{},[674],{"type":24,"value":675},"P__r",{"type":24,"value":651},{"type":18,"tag":32,"props":678,"children":679},{},[680],{"type":24,"value":681},"P__ˆx",{"type":24,"value":683}," are the corresponding distributions.",{"type":18,"tag":59,"props":685,"children":687},{"code":686},"from src.models.cra_utils.utils import random_interpolates, GradientsPenalty\n\n\nclass DisWithLossCell(nn.Cell):\n    \"\"\"\n    Build the discriminator loss.\n\n    Args:\n        net_g(cell): generator network.\n        net_d(cell): discriminator network.\n        args(class): option class.\n        auto_prefix(bool): whether to automatically generate namespace for cell and its subcells.\n            If set to True, the network parameter name will be prefixed, otherwise it will not.\n\n    Return:\n        loss_d: the loss of discriminator.\n    \"\"\"\n\n    def __init__(self, net_g, net_d, args, auto_prefix=True):\n        super(DisWithLossCell, self).__init__(auto_prefix=auto_prefix)\n        self.net_g = net_g\n        self.net_d = net_d\n        self.gan_wgan_loss = gan_wgan_loss\n        self.random_interpolates = random_interpolates\n        self.gradients_penalty = GradientsPenalty(self.net_d)\n        self.gan_with_mask = args.gan_with_mask\n        self.wgan_gp_lambda = args.wgan_gp_lambda\n        self.train_batchsize = args.train_batchsize\n        self.concat_0 = ops.Concat(0)\n        self.concat_1 = ops.Concat(1)\n        self.split = ops.Split(0, 2)\n\n    def construct(self, real, x, mask):\n        _, x2, _ = self.net_g(x, mask)\n        fake = x2\n        losses = {}\n        fake_patched = fake * mask + real * (1 - mask)\n        fake_patched = fake_patched.astype(mindspore.float32)\n        real_fake = self.concat_0((real, fake_patched))\n        if self.gan_with_mask:\n            real_fake = self.concat_1((real_fake, ops.Tile()(mask, (self.train_batchsize * 2, 1, 1, 1))))\n        d_real_fake = self.net_d(real_fake)\n        d_real, d_fake = self.split(d_real_fake)\n        _, d_loss = self.gan_wgan_loss(d_real, d_fake)\n        losses['adv_dloss'] = d_loss\n        interps = self.random_interpolates(real, fake_patched)\n        gp_loss = self.gradients_penalty(interps)\n        losses['gp_loss'] = self.wgan_gp_lambda * gp_loss\n        losses['d_loss'] = losses['adv_dloss'] + losses['gp_loss']\n        loss_d = losses['d_loss']\n        return loss_d\n",[688],{"type":18,"tag":64,"props":689,"children":690},{"__ignoreMap":7},[691],{"type":24,"value":686},{"type":18,"tag":26,"props":693,"children":694},{},[695],{"type":24,"value":696},"Set up the connection between loss functions and the network, and define the training network encapsulation class.",{"type":18,"tag":59,"props":698,"children":700},{"code":699},"import mindspore.ops.functional as F\nfrom mindspore.parallel._utils import (_get_device_num, _get_gradients_mean, _get_parallel_mode)\nfrom mindspore.context import ParallelMode\nfrom mindspore.nn.wrap.grad_reducer import DistributedGradReducer\n\n\nclass TrainOneStepD(nn.Cell):\n    \"\"\"Encapsulation class of discriminator network training.\"\"\"\n\n    def __init__(self, d, optimizer, sens=1.0):\n        super(TrainOneStepD, self).__init__(auto_prefix=True)\n        self.optimizer = optimizer\n        self.d = d\n        self.d.net_d.set_grad()\n        self.d.net_d.set_train()\n        self.d.net_g.set_grad(False)\n        self.d.net_g.set_train(False)\n        self.grad = ops.GradOperation(get_by_list=True, sens_param=True)\n        self.sens = sens\n        self.weights = optimizer.parameters\n        self.reducer_flag = False\n        self.fill = ops.Fill()\n        self.dtype = ops.DType()\n        self.shape = ops.Shape()\n        self.grad_reducer = F.identity\n        self.parallel_mode = _get_parallel_mode()\n        if self.parallel_mode in (ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL):\n            self.reducer_flag = True\n        if self.reducer_flag:\n            mean = _get_gradients_mean()\n            degree = _get_device_num()\n            self.grad_reducer = DistributedGradReducer(self.weights, mean, degree)\n\n    def construct(self, real, x, mask):\n        weights = self.weights\n        loss_d = self.d(real, x, mask)\n        sens_d = self.fill(self.dtype(loss_d), self.shape(loss_d), self.sens)\n        grads_d = self.grad(self.d, weights)(real, x, mask, sens_d)\n        if self.reducer_flag:\n            grads_d = self.grad_reducer(grads_d)\n        self.optimizer(grads_d)\n        return loss_d\n\n\nclass TrainOneStepG(nn.Cell):\n    \"\"\"Encapsulation class of generator network training.\"\"\"\n\n    def __init__(self, g, optimizer, sens=1.0):\n        super(TrainOneStepG, self).__init__(auto_prefix=True)\n        self.optimizer = optimizer\n        self.g = g\n        self.g.net_g.set_grad()\n        self.g.net_g.set_train()\n        self.g.net_d.set_grad(False)\n        self.g.net_d.set_train(False)\n        self.grad = ops.GradOperation(get_by_list=True, sens_param=True)\n        self.sens = sens\n        self.weights = optimizer.parameters\n        self.reducer_flag = False\n        self.fill = ops.Fill()\n        self.dtype = ops.DType()\n        self.shape = ops.Shape()\n        self.grad_reducer = F.identity\n        self.parallel_mode = _get_parallel_mode()\n        if self.parallel_mode in (ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL):\n            self.reducer_flag = True\n        if self.reducer_flag:\n            mean = _get_gradients_mean()\n            degree = _get_device_num()\n            self.grad_reducer = DistributedGradReducer(self.weights, mean, degree)\n\n    def construct(self, real, x, mask):\n        weights = self.weights\n        loss_g = self.g(real, x, mask)\n        sens_g = self.fill(self.dtype(loss_g), self.shape(loss_g), self.sens)\n        grads_g = self.grad(self.g, weights)(real, x, mask, sens_g)\n        if self.reducer_flag:\n            grads_g = self.grad_reducer(grads_g)\n        self.optimizer(grads_g)\n        return loss_g\n",[701],{"type":18,"tag":64,"props":702,"children":703},{"__ignoreMap":7},[704],{"type":24,"value":699},{"type":18,"tag":26,"props":706,"children":707},{},[708],{"type":18,"tag":48,"props":709,"children":710},{},[711],{"type":24,"value":712},"Optimizer Building",{"type":18,"tag":59,"props":714,"children":716},{"code":715},"net_g = GatedGenerator(config)\nnet_d = Discriminator()\nlr = nn.exponential_decay_lr(config.learning_rate, config.lr_decrease_factor, total_batch * config.epochs, total_batch,\n                             config.lr_decrease_epoch, True)\noptimizer_g = nn.Adam(filter(lambda p: p.requires_grad, net_g.trainable_params()), lr, 0.5, 0.9)\noptimizer_d = nn.Adam(net_d.trainable_params(), lr, 0.5, 0.9)\n",[717],{"type":18,"tag":64,"props":718,"children":719},{"__ignoreMap":7},[720],{"type":24,"value":715},{"type":18,"tag":26,"props":722,"children":723},{},[724,726,731,732,737,739,744,745,750],{"type":24,"value":725},"Here, two independent optimizers are set for the discriminator and generator respectively. The parameters ",{"type":18,"tag":48,"props":727,"children":728},{},[729],{"type":24,"value":730},"beta1",{"type":24,"value":175},{"type":18,"tag":48,"props":733,"children":734},{},[735],{"type":24,"value":736},"beta2",{"type":24,"value":738}," are set to ",{"type":18,"tag":48,"props":740,"children":741},{},[742],{"type":24,"value":743},"0.5",{"type":24,"value":175},{"type":18,"tag":48,"props":746,"children":747},{},[748],{"type":24,"value":749},"0.9",{"type":24,"value":751}," respectively. The learning rate is automatically updated using the exponential attenuation function.",{"type":18,"tag":26,"props":753,"children":754},{},[755],{"type":18,"tag":48,"props":756,"children":757},{},[758],{"type":24,"value":759},"Model Training",{"type":18,"tag":26,"props":761,"children":762},{},[763],{"type":24,"value":764},"Training is divided into two parts: discriminator training and generator training. Discriminator training is to better identify authenticity, and try to separate an image generated by the generator from a real image. Generator training is to generate a fake image that is approximately real as much as possible.",{"type":18,"tag":26,"props":766,"children":767},{},[768],{"type":24,"value":769},"Training process:",{"type":18,"tag":59,"props":771,"children":773},{"code":772},"import cv2\nimport time\n\nfrom mindspore import context, save_checkpoint, nn\n\nfrom src.config.config import cra_config\nfrom src.models.inpainting_network import GatedGenerator, Discriminator\nfrom src.models.loss import GenWithLossCell, DisWithLossCell\nfrom src.models.train_one_step import TrainOneStepD, TrainOneStepG\n\n\ndef trainer(args):\n    \"\"\"Train model.\"\"\"\n\n    # Preprocess the data for training\n    context.set_context(mode=context.GRAPH_MODE, device_target='GPU')\n    dataset_generator = InpaintDataset(args)\n    dataset_size = len(dataset_generator)\n    total_batch = dataset_size // args.train_batchsize\n    dataset = ds.GeneratorDataset(dataset_generator, ['image'])\n    dataset = dataset.batch(args.train_batchsize, drop_remainder=True)\n    dataset = dataset.create_dict_iterator()\n\n    # Network\n    net_g = GatedGenerator(args)\n    net_d = Discriminator()\n    netg_with_loss = GenWithLossCell(net_g, net_d, args)\n    netd_with_loss = DisWithLossCell(net_g, net_d, args)\n    lr = nn.exponential_decay_lr(args.learning_rate, args.lr_decrease_factor, total_batch * 10, total_batch,\n                                 args.lr_decrease_epoch, True)\n    optimizer_g = nn.Adam(filter(lambda p: p.requires_grad, net_g.trainable_params()), lr, 0.5, 0.9)\n    optimizer_d = nn.Adam(net_d.trainable_params(), lr, 0.5, 0.9)\n    train_discriminator = TrainOneStepD(netd_with_loss, optimizer_d)\n    train_generator = TrainOneStepG(netg_with_loss, optimizer_g)\n\n    # Train\n    train_discriminator.set_train()\n    train_generator.set_train()\n    print(\"Starting Training Loop...\")\n    for epoch in range(10):\n        for batch_idx, image in enumerate(dataset):\n            s = time.time()\n            real = image['image']\n            real = real.astype(mindspore.float32)\n            mask, _ = random_mask(args)\n            x = real * (1 - mask)\n            for _ in range(args.dis_iter):\n                netd_loss = train_discriminator(real, x, mask)\n            netg_loss = train_generator(real, x, mask)\n            gap = time.time() - s\n            # Print losses\n            print('epoch{}/{}, batch{}/{}, d_loss is {:.4f}, g_loss is {:.4f}, time is {:.4f}'.format(\n                epoch + 1, args.epochs, batch_idx + 1, total_batch, netd_loss.asnumpy(), netg_loss.asnumpy(), gap))\n            save_checkpoint_path = './ckpt_out'\n            if not os.path.isdir(save_checkpoint_path):\n                os.makedirs(save_checkpoint_path)\n            # Save checkpoint\n            gen_name = 'generator_epoch%d_batch%d.ckpt' % (epoch + 1, batch_idx + 1)\n            dis_name = 'discriminator_epoch%d_batch%d.ckpt' % (epoch + 1, batch_idx + 1)\n            gen_name = os.path.join(save_checkpoint_path, gen_name)\n            dis_name = os.path.join(save_checkpoint_path, dis_name)\n            if (batch_idx + 1) == total_batch:\n                save_checkpoint(train_generator, gen_name)\n                save_checkpoint(train_discriminator, dis_name)\ntrainer(cra_config)\n",[774],{"type":18,"tag":64,"props":775,"children":776},{"__ignoreMap":7},[777],{"type":24,"value":772},{"type":18,"tag":26,"props":779,"children":780},{},[781],{"type":18,"tag":48,"props":782,"children":783},{},[784],{"type":24,"value":785},"Model Inference",{"type":18,"tag":26,"props":787,"children":788},{},[789],{"type":24,"value":790},"After the GAN training is complete, we can use it to predict the inpainting result of a low-resolution image. However, to generate a complete high-resolution inpainted image, some postprocessing operations need to be performed, which are specifically: obtain the image's contextual residual information; generate aggregated residuals of missing content by using the high-frequency residual and attention mechanism; upsample the image generated by the GAN; add the aggregated residuals to the large and blurry image to obtain a clear inpainted image; and resize the inpainted image to a size the same as that of the original image.",{"type":18,"tag":59,"props":792,"children":794},{"code":793},"import glob\nimport cv2\nimport numpy as np\n\n\ndef sort(str_lst):\n    \"\"\"Return the sorted list in ascending order.\"\"\"\n\n    return [s for s in sorted(str_lst)]\n\n\ndef read_imgs_masks(args):\n    \"\"\"Sort the image and mask directories in order and return it.\"\"\"\n\n    paths_img = glob.glob(args.image_dir + '/*.*[g|G]')\n    paths_img = sort(paths_img)\n    paths_mask = glob.glob(args.mask_dir + '/*.*[g|G]')\n    paths_mask = sort(paths_mask)\n    return paths_img, paths_mask\n\n\ndef get_input(path_img, path_mask):\n    \"\"\"Read and process the image and mask through the given path.\"\"\"\n\n    image = cv2.imread(path_img)\n    mask = cv2.imread(path_mask)\n    image = np.expand_dims(image, 0)\n    mask = np.expand_dims(mask, 0)\n    return image[0], mask[0]\nfrom mindspore import nn, ops\n\nfrom src.models.inpainting_network import GatedGenerator\nfrom src.models.compute_attention import ApplyAttention2\n\n\ndef post_processing(large_img, small_img, low_base, small_mask, corres, args):\n    \"\"\"Subtracting the large blurry image from the raw input to compute contextual residuals,\n     and calculate aggregated residuals through attention transfer module.\n     Adding the aggregated residuals to the up-sampled generator inpainted result.\"\"\"\n\n    high_raw = large_img\n    low_raw = small_img\n    mask = 1 - small_mask\n    low_raw = nn.ResizeBilinear()(low_raw, scale_factor=args.times)\n    to_shape = list(ops.Shape()(mask))[2:]\n    to_shape[0], to_shape[1] = int(to_shape[0] * args.times), int(to_shape[1] * args.times)\n    resize = ops.ResizeNearestNeighbor((to_shape[0], to_shape[1]))\n    mask = resize(mask)\n    residual1 = (high_raw - low_raw) * mask\n    residual = ApplyAttention2([1, 3, 4096, 4096], [1, 1024, 32, 32])(residual1, corres)\n    low_base = nn.ResizeBilinear()(low_base, scale_factor=args.times)\n    x = low_base + residual\n    x = x.clip(-1, 1)\n    x = (x + 1.) * 127.5\n    return x, low_raw, low_base, residual\nfrom scipy import signal\n\nimport mindspore\nfrom mindspore import Tensor\n\n\ndef gaussian_kernel(size, std):\n    \"\"\"Return a gaussian kernel.\"\"\"\n\n    k = signal.gaussian(size, std)\n    kk = np.matmul(k[:, np.newaxis], [k])\n    return kk / np.sum(kk)\n\n\ndef resize_back(raw_img, large_output, small_mask):\n    \"\"\"Process the test output result in the format of [1, 3,4096,4096] to the same size as the original input image.\"\"\"\n\n    raw_shp = raw_img.shape\n    raw_size_output = nn.ResizeBilinear()(large_output, size=(raw_shp[2], raw_shp[3]))\n    raw_size_output = raw_size_output.astype(mindspore.float32)\n    gauss_kernel = gaussian_kernel(7, 1.)\n    gauss_kernel = Tensor(gauss_kernel)\n    gauss_kernel = gauss_kernel.astype(mindspore.float32)\n    gauss_kernel = ops.ExpandDims()(gauss_kernel, 2)\n    gauss_kernel = ops.ExpandDims()(gauss_kernel, 3)\n    a, b, c, d = ops.Shape()(gauss_kernel)\n    gauss_kernel = ops.Transpose()(gauss_kernel, (3, 2, 0, 1))\n    conv = nn.Conv2d(c, d, (a, b), 1, pad_mode='same', padding=0, weight_init=gauss_kernel, data_format='NCHW')\n    mask = conv(small_mask[:, 0:1, :, :])\n    mask = nn.ResizeBilinear()(mask, size=(raw_shp[2], raw_shp[3]))\n    mask = mask.astype(mindspore.float32)\n    raw_size_output = raw_size_output * mask + raw_img * (1 - mask)\n    raw_size_output = ops.Transpose()(raw_size_output, (0, 2, 3, 1))\n    raw_size_output = raw_size_output.astype(mindspore.uint8)\n    return raw_size_output\ndef build_inference_graph(real, mask, model_gen):\n    \"\"\"Input real and mask to generator and output the results.\"\"\"\n\n    mask = mask[0:1, 0:1, :, :]\n    x = real * (1. - mask)\n    _, x2, corres = model_gen(x, mask)\n    fake = x2\n    fake_patched = fake * mask + x * (1 - mask)\n    return x2, fake_patched, corres\n\n\ndef build_inference_net(raw_img_ph, raw_mask_ph, model_gen, args):\n    \"\"\"\n    Complete CRA network testing model, including image preprocessing, generator generation and output,\n        and image post-processing operations.\n\n    Args:\n        raw_img_ph(Tensor): image read from folder.\n            It is processed into the format of [1,3,512,512], the data type is float32, and normalized.\n        raw_mask_ph(Tensor): mask read from folder.\n            It is processed into the format of [1,3,512,512], the data type is float32, and normalized.\n        model_gen(cell): generation network.\n        args(class): option class.\n\n    Return:\n        raw_size_output: Large test output results.\n        raw_img_ph: Image read from folder.\n        raw_mask_ph: Mask read from folder.\n    \"\"\"\n\n    # Process input image\n    raw_img = ops.ExpandDims()(raw_img_ph, 0)\n    raw_img = raw_img.astype(mindspore.float32)\n    raw_img = ops.Transpose()(raw_img, (0, 3, 1, 2))\n    resize = ops.ResizeNearestNeighbor((args.times * args.input_size, args.times * args.input_size))\n    large_img = resize(raw_img)\n    large_img = ops.Reshape()(large_img, (1, 3, args.times * args.input_size, args.times * args.input_size))\n    large_img = large_img / 127.5 - 1\n    net = nn.Unfold([1, args.times, args.times, 1], [1, args.times, args.times, 1], [1, 1, 1, 1], 'same')\n    small_img = net(large_img)\n    small_img = ops.Transpose()(small_img, (0, 2, 3, 1))\n    small_img = ops.Reshape()(small_img, (1, args.input_size, args.input_size, args.times, args.times, 3))\n    small_img = ops.ReduceMean(False)(small_img, axis=(3, 4))\n    small_img = ops.Transpose()(small_img, (0, 3, 1, 2))\n    # Process input mask\n    raw_mask = ops.ExpandDims()(raw_mask_ph, 0)\n    raw_mask = raw_mask.astype(mindspore.float32)\n    raw_mask = ops.Transpose()(raw_mask, (0, 3, 1, 2))\n    resize = ops.ResizeNearestNeighbor((args.input_size, args.input_size))\n    small_mask = resize(raw_mask)\n    small_mask = ops.Reshape()(small_mask, (1, 3, args.input_size, args.input_size))\n    small_mask = 1 - small_mask / 255\n    # Input image and mask to generator\n    x2, _, corres = build_inference_graph(real=small_img, mask=small_mask, model_gen=model_gen)\n    # Post processing\n    large_output, _, _, _ = post_processing(large_img, small_img, x2, small_mask, corres, args)\n    # Resize back\n    raw_size_output = resize_back(raw_img, large_output, small_mask)\n    return raw_size_output, raw_img_ph, raw_mask_ph\n",[795],{"type":18,"tag":64,"props":796,"children":797},{"__ignoreMap":7},[798],{"type":24,"value":793},{"type":18,"tag":26,"props":800,"children":801},{},[802],{"type":24,"value":803},"The inference code is as follows:",{"type":18,"tag":59,"props":805,"children":807},{"code":806},"import os\nimport time\nimport argparse\nimport progressbar\n\nfrom mindspore import context, load_checkpoint, load_param_into_net\n\n\ndef parse_args():\n    \"\"\"Parse parameters.\"\"\"\n\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--image_dir', default='./test/images', type=str, help='The directory of images to be tested.')\n    parser.add_argument('--mask_dir', default='./test/masks', type=str, help='The directory of masks.')\n    parser.add_argument('--output_dir', default='./output', type=str, help='Where to write testing output.')\n    parser.add_argument('--checkpoint_dir', default='./ckpt_out/generator_epoch10_batch4.ckpt', type=str,\n                        help='The directory of loading checkpoint.')\n    parser.add_argument('--attention_type', default='SOFT', type=str, help='compute attention type.')\n    parser.add_argument('--train_batchsize', default=1, type=int, help='Batch size for testing.')\n    parser.add_argument('--input_size', default=512, type=int, help='The image size of the input network in the test.')\n    parser.add_argument('--times', default=8, type=int, help='The scaling size of input image.')\n    return parser.parse_args(args=[])\n\n\n# setting test data\ncra_config = parse_args()\nimg_paths, mask_paths = read_imgs_masks(cra_config)\nif not os.path.exists(cra_config.output_dir):\n    os.makedirs(cra_config.output_dir)\ntotal_time = 0\nbar = progressbar.ProgressBar(maxval=len(img_paths), widgets=[progressbar.Bar('=', '[', ']'), ' ',\n                                                              progressbar.Percentage()])\nbar.start()\n# load net and checkpoint file\ngen = GatedGenerator(cra_config)\nparam_dict = load_checkpoint(cra_config.checkpoint_dir)\nload_param_into_net(gen, param_dict)\n#test\nfor (i, img_path) in enumerate(img_paths):\n    rint = i % len(mask_paths)\n    bar.update(i + 1)\n    img_test, mask_test = get_input(img_path, mask_paths[rint])\n    s = time.time()\n    input_img_ph = Tensor(img_test)\n    input_mask_ph = Tensor(255 - mask_test)\n    outputs, input_img_ph, input_mask_ph = build_inference_net(input_img_ph, input_mask_ph, gen, cra_config)\n    res = outputs[0]\n    res = res.asnumpy()\n    total_time += time.time() - s\n    img_hole = img_test * (1 - mask_test / 255) + mask_test\n    res = np.concatenate([img_test, img_hole, res], axis=1)\n    cv2.imwrite(cra_config.output_dir + '/' + str(i) + '.jpg', res)\n    print('test finish')\nbar.finish()\nprint('average time per image', total_time / len(img_paths))\n",[808],{"type":18,"tag":64,"props":809,"children":810},{"__ignoreMap":7},[811],{"type":24,"value":806},{"type":18,"tag":26,"props":813,"children":814},{},[815],{"type":24,"value":816},"The inference result is displayed as follows. The first image is the original complete image, the second image is the image to be inpainted that contains holes, and the third image is the inpainting result.",{"type":18,"tag":26,"props":818,"children":819},{},[820],{"type":18,"tag":119,"props":821,"children":823},{"alt":7,"src":822},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/07/03/dd6cf81abf044a569200e0cc5c40c50b.png",[],{"type":18,"tag":26,"props":825,"children":826},{},[827],{"type":18,"tag":48,"props":828,"children":829},{},[830],{"type":24,"value":831},"References",{"type":18,"tag":26,"props":833,"children":834},{},[835],{"type":24,"value":836},"[1] Z. Yi, Q. Tang, S. Azizi, D. Jang and Z. Xu. Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting[J]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 7505-7514.",{"title":7,"searchDepth":838,"depth":838,"links":839},4,[],"markdown","content:technology-blogs:en:2598.md","content","technology-blogs/en/2598.md","technology-blogs/en/2598","md",1776506106744]