[{"data":1,"prerenderedAt":633},["ShallowReactive",2],{"content-query-WF2TDYIgRR":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"cover":11,"type":12,"category":13,"body":14,"_type":627,"_id":628,"_source":629,"_file":630,"_stem":631,"_extension":632},"/technology-blogs/en/1836","en",false,"","MindSpore Golden Stick: Compression Algorithm Set of the SOTA Model","However, the improvement of device computing power, memory, and battery power still cannot meet the requirements of deploying neural networks. To solve this problem, model compression algorithms emerge.","2022-08-26","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/6cf984958c5a40169f3aa872a8f20b0a.png","technology-blogs","Practices",{"type":15,"children":16,"toc":624},"root",[17,25,35,40,48,53,68,76,81,88,93,98,103,108,113,118,123,128,133,138,146,151,159,164,169,177,182,187,192,197,204,209,214,222,227,232,239,247,255,260,265,270,275,282,287,294,299,304,311,319,326,331,336,341,348,353,358,389,394,399,404,411,416,423,428,435,461,469,474,479,500,508,513,518,523,528,533,544,555,566,571,579,584,589,594,599,604,609,614,619],{"type":18,"tag":19,"props":20,"children":22},"element","h1",{"id":21},"mindspore-golden-stick-compression-algorithm-set-of-the-sota-model",[23],{"type":24,"value":8},"text",{"type":18,"tag":26,"props":27,"children":28},"p",{},[29],{"type":18,"tag":30,"props":31,"children":32},"strong",{},[33],{"type":24,"value":34},"1. Why Do We Need to Compress Models?",{"type":18,"tag":26,"props":36,"children":37},{},[38],{"type":24,"value":39},"Recent years have witnessed great achievements that deep neural networks (DNNs) made in many fields such as computer vision (CV) and natural language processing (NLP), due to the maturity of computing power and DNN technologies and the explosive growth of data. The network scale quickly expands and the parameter quantity explosively grows. However, the improvement of device computing power, memory, and battery power still cannot meet the requirements of deploying neural networks. To solve this problem, model compression algorithms emerge.",{"type":18,"tag":26,"props":41,"children":42},{},[43],{"type":18,"tag":44,"props":45,"children":47},"img",{"alt":7,"src":46},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/4dbdcd2e0b164feea70e69e364076a2e.png",[],{"type":18,"tag":26,"props":49,"children":50},{},[51],{"type":24,"value":52},"Figure 1. According to the Moore's Law and the computing power required for model training, the growth of device computing power is far less than that of model computing power. Model deployment also faces similar difficulties.",{"type":18,"tag":26,"props":54,"children":55},{},[56,58,66],{"type":24,"value":57},"(Figure source: UC Berkeley: ",{"type":18,"tag":59,"props":60,"children":64},"a",{"href":61,"rel":62},"https://medium.com/riselab/ai-and-memory-wall-2cb4265cb0b8",[63],"nofollow",[65],{"type":24,"value":61},{"type":24,"value":67},")",{"type":18,"tag":26,"props":69,"children":70},{},[71],{"type":18,"tag":30,"props":72,"children":73},{},[74],{"type":24,"value":75},"2. MindSpore Golden Stick",{"type":18,"tag":26,"props":77,"children":78},{},[79],{"type":24,"value":80},"The MindSpore Golden Stick, jointly developed by the Huawei Noah's Ark Laboratory and MindSpore team, is a model compression algorithm set. It provides a collection of algorithms, such as pruning and quantization, to simplify the DNN deployment on devices by reducing the number of model parameters and offers easy-to-use algorithm interfaces to reduce the cost of applying model compression algorithms.",{"type":18,"tag":26,"props":82,"children":83},{},[84],{"type":18,"tag":44,"props":85,"children":87},{"alt":7,"src":86},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/279c85f868b3433fb3d9175a6ad778dc.png",[],{"type":18,"tag":26,"props":89,"children":90},{},[91],{"type":24,"value":92},"Figure 2. Overall architecture of the MindSpore Golden Stick",{"type":18,"tag":26,"props":94,"children":95},{},[96],{"type":24,"value":97},"1",{"type":18,"tag":26,"props":99,"children":100},{},[101],{"type":24,"value":102},"The MindSpore Rewrite module at the bottom layer provides the capability of modifying the front-end network. By using interfaces of this module, algorithm developers can add, delete, query, and modify nodes and topology relationships on the MindSpore front-end network according to specific rules.",{"type":18,"tag":26,"props":104,"children":105},{},[106],{"type":24,"value":107},"2",{"type":18,"tag":26,"props":109,"children":110},{},[111],{"type":24,"value":112},"Based on the basic capabilities of MindSpore Rewrite, the MindSpore Golden Stick provides various algorithms, such as the simulated quantization aware training (SimQAT) algorithm, searching for low-bit weights (SLB) algorithm, and scientific control for reliable neural network pruning (SCOP) algorithm.",{"type":18,"tag":26,"props":114,"children":115},{},[116],{"type":24,"value":117},"3",{"type":18,"tag":26,"props":119,"children":120},{},[121],{"type":24,"value":122},"Based on some basic quantization and pruning algorithms, the MindSpore Golden Stick is also planning high-level technologies, such as algorithm combination technologies, AutoML for model compression and acceleration on mobile devices (AMC), neural architecture search (NAS), and hardware-aware automated quantization (HAQ).",{"type":18,"tag":26,"props":124,"children":125},{},[126],{"type":24,"value":127},"4",{"type":18,"tag":26,"props":129,"children":130},{},[131],{"type":24,"value":132},"To help developers analyze and debug algorithms, the MindSpore Golden Stick intends to provide related tools, such as the visualization tool, layer-by-layer analysis tool, and algorithm compression effect analysis tool.",{"type":18,"tag":26,"props":134,"children":135},{},[136],{"type":24,"value":137},"(The third and fourth capabilities are being developed.)",{"type":18,"tag":26,"props":139,"children":140},{},[141],{"type":18,"tag":30,"props":142,"children":143},{},[144],{"type":24,"value":145},"Unified Algorithm Interfaces",{"type":18,"tag":26,"props":147,"children":148},{},[149],{"type":24,"value":150},"The types of model compression algorithms are diversified in different scenarios, making them difficult to learn. The MindSpore Golden Stick streamlines and abstracts the algorithm application process and provides a set of unified algorithm interfaces to minimize the learning cost of algorithm applications and facilitate the development of advanced technologies.",{"type":18,"tag":26,"props":152,"children":153},{},[154],{"type":18,"tag":30,"props":155,"children":156},{},[157],{"type":24,"value":158},"Network Rewrite Capability",{"type":18,"tag":26,"props":160,"children":161},{},[162],{"type":24,"value":163},"Model compression algorithms are usually designed to optimize a specific network structure. For example, the SimQAT algorithm usually inserts pseudo-quantization nodes into the Conv2d or Conv2d + BatchNorm2d structure in the network.",{"type":18,"tag":26,"props":165,"children":166},{},[167],{"type":24,"value":168},"The MindSpore Golden Stick provides the pattern-based capability of modifying front-end networks, helping algorithm developers formulate image modification rules to implement algorithm logic without implementing it on each specific network, thereby improving the algorithm access efficiency.",{"type":18,"tag":26,"props":170,"children":171},{},[172],{"type":18,"tag":30,"props":173,"children":174},{},[175],{"type":24,"value":176},"SimQAT Algorithm",{"type":18,"tag":26,"props":178,"children":179},{},[180],{"type":24,"value":181},"Network quantization is the process of converting floating-point computing into low-bit fixed-point computing. It can effectively reduce the network computing workload, parameter size, and memory consumption, but often causes precision loss.",{"type":18,"tag":26,"props":183,"children":184},{},[185],{"type":24,"value":186},"In other words, quantization is a process in which weights of 32-bit floats are approximated at fixed points (usually INT8) to a limited quantity (or a relatively small quantity) of discrete values at a low inference precision loss.",{"type":18,"tag":26,"props":188,"children":189},{},[190],{"type":24,"value":191},"Going from 32-bit to 8-bit, for example, would reduce the network size and memory usage during network deployment and accelerate the network inference, though the network input and output are still of the floating type.",{"type":18,"tag":26,"props":193,"children":194},{},[195],{"type":24,"value":196},"Although quantization brings precision loss by introducing noises, the neural networks are not sensitive to noises. As long as the quantization is restricted to a certain degree, impacts on the precision of a high-level task can be minimized. Compared with the original network, the quantized network performs INT8 computing instead of FP32 computing during the inference, greatly improving performance.",{"type":18,"tag":26,"props":198,"children":199},{},[200],{"type":18,"tag":44,"props":201,"children":203},{"alt":7,"src":202},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/c66881977337462f9578c3a933b84f93.png",[],{"type":18,"tag":26,"props":205,"children":206},{},[207],{"type":24,"value":208},"Figure 3. Compared with the FP32 data type, low-precision data representation types such as FP16 and INT8 occupy less space. As shown in the figure, use low-precision data representation types can reduce storage space and transmission time. In addition, low-bit computing has better performance. The INT8 computing speed is three times or more than that of FP32, and has obvious advantages in power consumption.",{"type":18,"tag":26,"props":210,"children":211},{},[212],{"type":24,"value":213},"The SimQAT algorithm uses pseudo-quantization nodes to simulate the loss of quantization calculation during training and updates network parameters via backpropagation so that the network parameters can better adapt to the loss caused by the quantization. For details, see [1].",{"type":18,"tag":26,"props":215,"children":216},{},[217],{"type":18,"tag":30,"props":218,"children":219},{},[220],{"type":24,"value":221},"Effect",{"type":18,"tag":26,"props":223,"children":224},{},[225],{"type":24,"value":226},"Currently, SimQAT supports INT8 quantization. The following table lists the common precision data.",{"type":18,"tag":26,"props":228,"children":229},{},[230],{"type":24,"value":231},"Table. Model accuracy after SimQAT is performed",{"type":18,"tag":26,"props":233,"children":234},{},[235],{"type":18,"tag":44,"props":236,"children":238},{"alt":7,"src":237},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/06db6e03e84e43fda96b614f5339b34a.png",[],{"type":18,"tag":26,"props":240,"children":241},{},[242],{"type":18,"tag":30,"props":243,"children":244},{},[245],{"type":24,"value":246},"SCOP to Reduce Model Power Consumption by 50%",{"type":18,"tag":26,"props":248,"children":249},{},[250],{"type":18,"tag":30,"props":251,"children":252},{},[253],{"type":24,"value":254},"Principles",{"type":18,"tag":26,"props":256,"children":257},{},[258],{"type":24,"value":259},"Neural network pruning is a popular approach that removes some parameters in a neural network to minimize the parameter quantity and computing workloads. There are two pruning types: unstructured pruning and structured pruning. Take the convolutional neural network (CNN) as an example. Unstructured pruning is to remove some weights from the convolution kernel. Although it can achieve a high compression ratio, the actual acceleration depends on the special hardware design. It is difficult to obtain benefits on the Ascend, GPU, and CPU platforms. Structured pruning directly removes the complete convolution kernel from the CNN without damaging the network topology. It can directly accelerate model inference without specific software and hardware adaptation.",{"type":18,"tag":26,"props":261,"children":262},{},[263],{"type":24,"value":264},"Discovering redundant convolution kernels is a key step in structured pruning, which is usually performed via two methods. For the first method, training data is not required, while some assumptions need to be defined to determine the importance of convolution kernels. For example, a typical assumption is that a convolution kernel with a small norm is not important, and cutting off some convolution kernels with a small norm does not affect network performance too much.",{"type":18,"tag":26,"props":266,"children":267},{},[268],{"type":24,"value":269},"For the second method, the data-driven approach is introduced to learn the importance of convolution kernels via training data. For example, an additional control coefficient is introduced for each convolution kernel, and importance of different convolution kernels is measured by learning their control coefficients. A convolution kernel of a small control coefficient is considered unimportant.",{"type":18,"tag":26,"props":271,"children":272},{},[273],{"type":24,"value":274},"SCOP, developed by the Noah's Ark Laboratory, is a data-driven method that detects redundant convolution kernels. By using features with independent co-distribution, it performs a control experiment to remove factors that affect the pruning, thereby improving the reliability of pruning results. Real data and knockoff data are input to the network at the same time to generate real features and knockoff features, respectively. If the knockoff feature corresponding to a convolution kernel suppresses the real feature, the convolution kernel is considered redundant and should be deleted.",{"type":18,"tag":26,"props":276,"children":277},{},[278],{"type":18,"tag":44,"props":279,"children":281},{"alt":7,"src":280},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/95587a6683d248129e481e4911cfcf75.png",[],{"type":18,"tag":26,"props":283,"children":284},{},[285],{"type":24,"value":286},"Figure 4. SCOP principles",{"type":18,"tag":26,"props":288,"children":289},{},[290],{"type":18,"tag":30,"props":291,"children":292},{},[293],{"type":24,"value":221},{"type":18,"tag":26,"props":295,"children":296},{},[297],{"type":24,"value":298},"We applied the SCOP pruning algorithm to ResNet-50 and used the CIFAR-10 dataset for evaluation. The following table lists the experimental results. It can be found that in the current task, compared with the original model, when the pruning rate is 45%, the model after SCOP greatly reduces the parameters of the model, and the accuracy loss is within 0.5%.",{"type":18,"tag":26,"props":300,"children":301},{},[302],{"type":24,"value":303},"Table. Experimental results",{"type":18,"tag":26,"props":305,"children":306},{},[307],{"type":18,"tag":44,"props":308,"children":310},{"alt":7,"src":309},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/41ea0624937c4b60b2c76f4f3fc8d5c2.png",[],{"type":18,"tag":26,"props":312,"children":313},{},[314],{"type":18,"tag":30,"props":315,"children":316},{},[317],{"type":24,"value":318},"SLB to Compress the Model by 8 - 32 Times",{"type":18,"tag":26,"props":320,"children":321},{},[322],{"type":18,"tag":30,"props":323,"children":324},{},[325],{"type":24,"value":254},{"type":18,"tag":26,"props":327,"children":328},{},[329],{"type":24,"value":330},"To calculate gradients, conventional quantization methods usually use a straight-through estimator (STE)[1] or a self-designed method[2]. However, quantization functions are indifferentiable, which causes errors during the gradient calculation and finally brings poor inference accuracy ratio. Therefore, a neural network learning method for quantization is required to avoid this inaccurate gradient estimation.",{"type":18,"tag":26,"props":332,"children":333},{},[334],{"type":24,"value":335},"SLB[3] is a weight quantization algorithm developed by the Huawei Noah's Ark Laboratory. It provides a low-bit quantization algorithm based on weight search to avoid inaccurate gradient estimation.",{"type":18,"tag":26,"props":337,"children":338},{},[339],{"type":24,"value":340},"For quantization of a low-bit network, there are few effective solutions for quantizing the network weights. Therefore, the quantization of the network may be implemented through weight search, that is, the quantization process is converted into a weight search process. A group of quantization weights are preset for the quantization network, and then a probability matrix is defined to represent the probability that different quantization weights are retained. In the training phase, the network weights are quantized by optimizing the probability matrix.",{"type":18,"tag":26,"props":342,"children":343},{},[344],{"type":18,"tag":44,"props":345,"children":347},{"alt":7,"src":346},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/afae0b25441c46d089e4d8e5bfe9f77e.png",[],{"type":18,"tag":26,"props":349,"children":350},{},[351],{"type":24,"value":352},"Figure 4. Conventional quantization algorithm vs. SLB quantization algorithm",{"type":18,"tag":26,"props":354,"children":355},{},[356],{"type":24,"value":357},"The left part of the figure shows the binary quantization result of a conventional quantization algorithm. During training, the floating point weights are updated through inaccurate gradients, and these weights are quantized through the sigmoid function. And the right part shows the binary quantization result of the SLB algorithm. The continuous relaxation strategy is used to search for discrete weights, optimize the probability distribution of discrete weights during training, and finally select discrete weights according to the probability to realize quantization.",{"type":18,"tag":26,"props":359,"children":360},{},[361,363,368,370,375,377,381,383,387],{"type":24,"value":362},"The value of each red point in the left part is obtained through the sigmoid function, indicating the probability that the weight is quantized to ",{"type":18,"tag":30,"props":364,"children":365},{},[366],{"type":24,"value":367},"-1",{"type":24,"value":369},". The value of each blue point is also obtained through the sigmoid function, but it indicates the probability that the weight is quantized to ",{"type":18,"tag":30,"props":371,"children":372},{},[373],{"type":24,"value":374},"+1",{"type":24,"value":376},". Inaccurate gradient update in conventional quantization algorithms affects the update of floating-point weights, resulting in a large deviation in the probability. However, values of red and blue points in the right part are obtained through the softmax function, though they also indicate the probabilities that the weight is quantized to ",{"type":18,"tag":30,"props":378,"children":379},{},[380],{"type":24,"value":367},{"type":24,"value":382}," or ",{"type":18,"tag":30,"props":384,"children":385},{},[386],{"type":24,"value":374},{"type":24,"value":388},". Because inaccurate gradient updates are avoided, the probability is more accurate.",{"type":18,"tag":26,"props":390,"children":391},{},[392],{"type":24,"value":393},"In classification tasks, softmax distribution is used to calculate the probability that the output is classified into different classes. The SLB also uses softmax distribution to calculate the probability that a weight is quantized into each quantized weight, and finally selects a corresponding weight as a quantization result based on the maximum probability.",{"type":18,"tag":26,"props":395,"children":396},{},[397],{"type":24,"value":398},"To improve the confidence of the quantization result, the SLB introduces a temperature factor. By gradually adjusting the temperature factor, the softmax distribution gradually becomes steep and gradually approaches the one-hot distribution, thereby maximizing the confidence of the quantization result and reducing the error.",{"type":18,"tag":26,"props":400,"children":401},{},[402],{"type":24,"value":403},"The formula on the left is a standard softmax function, and the formula on the right is the softmax function after the temperature factor is introduced in the SLB algorithm.",{"type":18,"tag":26,"props":405,"children":406},{},[407],{"type":18,"tag":44,"props":408,"children":410},{"alt":7,"src":409},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/65b69a9a1ac2415ebddd376066a34d31.png",[],{"type":18,"tag":26,"props":412,"children":413},{},[414],{"type":24,"value":415},"Figure 5. The figure shows the change process of softmax distribution when the temperature factor is gradually adjusted. The rightmost figure shows the one-hot distribution.",{"type":18,"tag":26,"props":417,"children":418},{},[419],{"type":18,"tag":30,"props":420,"children":421},{},[422],{"type":24,"value":221},{"type":18,"tag":26,"props":424,"children":425},{},[426],{"type":24,"value":427},"We used SLB to quantize the ResNet-18 network and CIFAR-10 dataset for evaluation. In the following figure, it can be found that, in the current task, compared with the full-precision model, the top 1 accuracy of the model after 4-bit weight quantization has no loss, and the top 1 accuracy loss of the model after 1-bit weight quantization is within 0.6%. SLB quantization greatly reduces model parameters, making it easier to deploy models on devices with limited resources.",{"type":18,"tag":26,"props":429,"children":430},{},[431],{"type":18,"tag":44,"props":432,"children":434},{"alt":7,"src":433},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/92d6fcfa02674ce395feb7094f632503.png",[],{"type":18,"tag":26,"props":436,"children":437},{},[438,440,445,447,452,454,459],{"type":24,"value":439},"Figure 6. Quantization results of ResNet-18 and CIFAR-10 W32 indicate a full-precision model. ",{"type":18,"tag":30,"props":441,"children":442},{},[443],{"type":24,"value":444},"W4",{"type":24,"value":446}," indicates that the weight is 4 bits, ",{"type":18,"tag":30,"props":448,"children":449},{},[450],{"type":24,"value":451},"W2",{"type":24,"value":453}," indicates that the weight is 2 bits, and ",{"type":18,"tag":30,"props":455,"children":456},{},[457],{"type":24,"value":458},"W1",{"type":24,"value":460}," indicates that the weight is 1 bit.",{"type":18,"tag":26,"props":462,"children":463},{},[464],{"type":18,"tag":30,"props":465,"children":466},{},[467],{"type":24,"value":468},"3. Summary and Prospects",{"type":18,"tag":26,"props":470,"children":471},{},[472],{"type":24,"value":473},"The MindSpore Golden Stick is not only a model compression algorithm set but also a platform. It builds a bridge between algorithm users and developers by providing unified algorithm application interfaces and capabilities of modifying network definitions, maximizing the commercial values of algorithms.",{"type":18,"tag":26,"props":475,"children":476},{},[477],{"type":24,"value":478},"In later versions, the MindSpore Golden Stick, in one hand, will provide more excellent algorithms to facilitate the neural network deployment on devices by providing technologies such as post-training quantization algorithm in Vision Transformer (ViT)[5], knowledge distillation[6][8], and GhostNet[7]. In other hand, it will improve the capabilities of modifying network definitions and debugging networks. We are sincerely welcome every algorithm developer to contribute to the MindSpore open source community.",{"type":18,"tag":26,"props":480,"children":481},{},[482,484,490,492,498],{"type":24,"value":483},"For details about the MindSpore Golden Stick, visit ",{"type":18,"tag":59,"props":485,"children":488},{"href":486,"rel":487},"https://gitee.com/mindspore/golden-stick",[63],[489],{"type":24,"value":486},{"type":24,"value":491},". And for detailed description of the MindSpore Golden Stick, visit ",{"type":18,"tag":59,"props":493,"children":496},{"href":494,"rel":495},"https://www.mindspore.cn/golden_stick/docs/en/r0.1/index.html",[63],[497],{"type":24,"value":494},{"type":24,"value":499},".",{"type":18,"tag":26,"props":501,"children":502},{},[503],{"type":18,"tag":30,"props":504,"children":505},{},[506],{"type":24,"value":507},"4. Introduction to the Noah's Ark Laboratory",{"type":18,"tag":26,"props":509,"children":510},{},[511],{"type":24,"value":512},"The core algorithm of the MindSpore Golden Stick is developed by the Huawei Noah's Ark Laboratory. The Laboratory is a research center that focuses on AI algorithms to develop AI engines with efficient data processing and high energy utilization. The Laboratory has many R&D branches in areas such as Asia, Europe, and North America,",{"type":18,"tag":26,"props":514,"children":515},{},[516],{"type":24,"value":517},"and aims to innovate AI and data mining technologies, thus providing better products and services to contribute to the company and society.",{"type":18,"tag":26,"props":519,"children":520},{},[521],{"type":24,"value":522},"As a world-class lab, the Noah's Ark Laboratory has committed to developing advanced AI technologies and designing a fully intelligent process to change the service pattern and people's daily life under the vision of \"bring digital to every person, home and organization for a fully connected, intelligent world\".",{"type":18,"tag":26,"props":524,"children":525},{},[526],{"type":24,"value":527},"The Laboratory has made many achievements in computer vision, voice and natural language processing, recommendation system and search engine, decision-making inference, and AI basic theory since its foundation in 2012",{"type":18,"tag":26,"props":529,"children":530},{},[531],{"type":24,"value":532},"The following lists the recent achievements of the Laboratory:",{"type":18,"tag":26,"props":534,"children":535},{},[536,538],{"type":24,"value":537},"1. Overview of the Industry's First Transformer in Computer Vision: ",{"type":18,"tag":59,"props":539,"children":542},{"href":540,"rel":541},"https://ieeexplore.ieee.org/document/9716741",[63],[543],{"type":24,"value":540},{"type":18,"tag":26,"props":545,"children":546},{},[547,549],{"type":24,"value":548},"2. Introduction to the Dialogue Generation Model PanGu-Bot: ",{"type":18,"tag":59,"props":550,"children":553},{"href":551,"rel":552},"https://arxiv.org/abs/2203.17090",[63],[554],{"type":24,"value":551},{"type":18,"tag":26,"props":556,"children":557},{},[558,560],{"type":24,"value":559},"3. Introduction to the Efficient Code Generation Model PanGu-Coder: ",{"type":18,"tag":59,"props":561,"children":564},{"href":562,"rel":563},"https://arxiv.org/pdf/2207.11280.pdf",[63],[565],{"type":24,"value":562},{"type":18,"tag":26,"props":567,"children":568},{},[569],{"type":24,"value":570},"4. Noah's Paper Collection at the 2022 Computer Vision and Pattern Recognition Conference (CVPR)",{"type":18,"tag":26,"props":572,"children":573},{},[574],{"type":18,"tag":30,"props":575,"children":576},{},[577],{"type":24,"value":578},"References",{"type":18,"tag":26,"props":580,"children":581},{},[582],{"type":24,"value":583},"[1] Bengio, Yoshua, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. 2013.",{"type":18,"tag":26,"props":585,"children":586},{},[587],{"type":24,"value":588},"[2] Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. ICLR, 2019.",{"type":18,"tag":26,"props":590,"children":591},{},[592],{"type":24,"value":593},"[3] Yang Z, Wang Y, Han K, et al. Searching for low-bit weights in quantized neural",{"type":18,"tag":26,"props":595,"children":596},{},[597],{"type":24,"value":598},"networks. NIPS, 2020.",{"type":18,"tag":26,"props":600,"children":601},{},[602],{"type":24,"value":603},"[4] Tang, Yehui, et al. \"Scop: Scientific control for reliable neural network pruning.\" NeurIPS 2020: 10936-10947.",{"type":18,"tag":26,"props":605,"children":606},{},[607],{"type":24,"value":608},"[5] Liu, Zhenhua, et al. \"Post-training quantization for vision transformer.\" Advances in Neural Information Processing Systems 34 (2021): 28092-28103.",{"type":18,"tag":26,"props":610,"children":611},{},[612],{"type":24,"value":613},"[6] Xu, Yixing, et al. \"Kernel based progressive distillation for adder neural networks.\" Advances in Neural Information Processing Systems 33 (2020): 12322-12333.",{"type":18,"tag":26,"props":615,"children":616},{},[617],{"type":24,"value":618},"[7] Han, Kai, et al. \"Ghostnet: More features from cheap operations.\" Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.",{"type":18,"tag":26,"props":620,"children":621},{},[622],{"type":24,"value":623},"[8] Chen, Hanting, et al. \"Data-free learning of student networks.\" Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.",{"title":7,"searchDepth":625,"depth":625,"links":626},4,[],"markdown","content:technology-blogs:en:1836.md","content","technology-blogs/en/1836.md","technology-blogs/en/1836","md",1776506105170]