[{"data":1,"prerenderedAt":231},["ShallowReactive",2],{"content-query-qSS89F7FAw":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"cover":11,"type":12,"category":13,"body":14,"_type":225,"_id":226,"_source":227,"_file":228,"_stem":229,"_extension":230},"/technology-blogs/en/1839","en",false,"","A Graph Neural Architecture Search System Under the Scalable Paradigm (PaSca Paper Interpretation)","A graph neural architecture search system under the scalable paradigm (PaSca paper interpretation)","2022-06-17","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/ff045f9d6ea04829b273c9aabaafb109.png","technology-blogs","Influencers",{"type":15,"children":16,"toc":222},"root",[17,25,31,36,49,54,59,71,80,113,121,129,155,162,170,175,182,187,194,202,212],{"type":18,"tag":19,"props":20,"children":22},"element","h1",{"id":21},"a-graph-neural-architecture-search-system-under-the-scalable-paradigm-pasca-paper-interpretation",[23],{"type":24,"value":8},"text",{"type":18,"tag":26,"props":27,"children":28},"p",{},[29],{"type":24,"value":30},"June 17, 2022",{"type":18,"tag":26,"props":32,"children":33},{},[34],{"type":24,"value":35},"Author: Yu Fan",{"type":18,"tag":26,"props":37,"children":38},{},[39,41],{"type":24,"value":40},"Article source: ",{"type":18,"tag":42,"props":43,"children":47},"a",{"href":44,"rel":45},"https://zhuanlan.zhihu.com/p/525292195",[46],"nofollow",[48],{"type":24,"value":44},{"type":18,"tag":26,"props":50,"children":51},{},[52],{"type":24,"value":53},"Most graph neural network (GNN) pipelines can be described in terms of the neural message passing (NMP) framework, which is based on the core idea of recursive neighborhood aggregation and transformation. The exponential growth of neighborhood size corresponds to an exponential I/O overhead, and the speedup ratio is far lower than the linear ratio, undermining overall performance. The situation gets worse as layers increase. To address this challenge, some studies attempt to deconstruct the message passing mechanism. They use the graph structure-based aggregation calculation for sampling, and the message update prediction for training to design a single GNN network, such as APPNP. The carefully designed structure ensures the accuracy of the model, but provides no general solution. This year, the best student thesis of the ACM Web Conference (WWW) presents a novel scalable paradigm and a multi-objective search algorithm. Within 150,000 different designs, PaSca-V3 outperforms the SOTA models by 0.4% in terms of predictive accuracy on our large industry dataset while achieving up to 28.3x training speedups.",{"type":18,"tag":26,"props":55,"children":56},{},[57],{"type":24,"value":58},"Paper: PaSca: a Graph Neural Architecture Search System under the Scalable Paradigm",{"type":18,"tag":26,"props":60,"children":61},{},[62,64],{"type":24,"value":63},"link: ",{"type":18,"tag":42,"props":65,"children":68},{"href":66,"rel":67},"https://link.zhihu.com/?target=https%20%3A//arxiv.org/pdf/2203.00638.pdf",[46],[69],{"type":24,"value":70},"https://arxiv.org/pdf/2203.00638.pdf",{"type":18,"tag":26,"props":72,"children":73},{},[74],{"type":18,"tag":75,"props":76,"children":77},"strong",{},[78],{"type":24,"value":79},"Scalable GNN Paradigm",{"type":18,"tag":26,"props":81,"children":82},{},[83,85,90,92,97,99,104,106,111],{"type":24,"value":84},"This blog divides the process of the message spreading into three phases: ",{"type":18,"tag":75,"props":86,"children":87},{},[88],{"type":24,"value":89},"pre-processing, model training, and post-processing",{"type":24,"value":91},". ",{"type":18,"tag":75,"props":93,"children":94},{},[95],{"type":24,"value":96},"Pre-processing",{"type":24,"value":98}," aggregates data based on graph structure by obtaining features from neighborhood nodes which are not limited to the 1-hop neighbors. Instead, neighborhood features in a large range can be aggregated in multiple manners such as randomwalk, pagerank, and adaptation, striking a balance of information between lower-order neighbors and higher-order neighbors. ",{"type":18,"tag":75,"props":100,"children":101},{},[102],{"type":24,"value":103},"Model training",{"type":24,"value":105}," combines and converts pre-processed neighborhood features to generate feature vectors of each node. The aggregated features in the last step may be directly obtained without the multi-step combination of neighborhood aggregated features. Here, Multi-layer Perceptron (MLP) is used for conversion. ",{"type":18,"tag":75,"props":107,"children":108},{},[109],{"type":24,"value":110},"Post-processing",{"type":24,"value":112}," is similar to label propagation. The features trained by the model are aggregated in the neighborhood again to generate prediction features. Different from the message passing mechanism, in this paradigm feature aggregation and parameter training are independent of each other, providing high scalability and acceleration during training. In addition, unlike many GNN networks that use only the last aggregated features, the combination mechanism of neighborhood features can combine low-order and extended neighborhood messages. The paper experiment also proves that the post-processing process can improve the model stage.",{"type":18,"tag":26,"props":114,"children":115},{},[116],{"type":18,"tag":117,"props":118,"children":120},"img",{"alt":7,"src":119},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/4d93e2a5d03743e8910f0e5217a47448.png",[],{"type":18,"tag":26,"props":122,"children":123},{},[124],{"type":18,"tag":75,"props":125,"children":126},{},[127],{"type":24,"value":128},"Architecture Search System",{"type":18,"tag":26,"props":130,"children":131},{},[132,134,139,141,146,148,153],{"type":24,"value":133},"The scalable GNN search space consists of the definitions of steps and the algorithm selection range. The whole search system is divided into two parts: ",{"type":18,"tag":75,"props":135,"children":136},{},[137],{"type":24,"value":138},"search engine and evaluation engine",{"type":24,"value":140},". The ",{"type":18,"tag":75,"props":142,"children":143},{},[144],{"type":24,"value":145},"search",{"type":24,"value":147}," part uses the multi-objective search algorithm based on Bayesian optimization. The suggestion server models the relationship each architecture instance and its objective values based on the historical observation data, randomly samples a number of new instances, and suggests the best one which maximizes the EHVI for the evaluation engine. The evaluation engine evaluates the sample model architecture, and feeds back the result and observation indicators to the search engine to update the historical observation data. To accelerate the search process, the ",{"type":18,"tag":75,"props":149,"children":150},{},[151],{"type":24,"value":152},"evaluation engine",{"type":24,"value":154}," is divided into two parts: graph data aggregator and neural architecture trainer. The engine also performs distributed storage and computing. The only difference between pre- and post-processing is that the features of input nodes are different. Graph data is divided into batches and sent to workers for parallel computing. The calculation results of each step are stored on workers in a distributed manner. Then, the result of a hop is obtained as the input. The distributed parallel computing greatly reduces calculation time. The neural architecture trainer part is the same as the distributed training based on the parameter server of other deep learning methods.",{"type":18,"tag":26,"props":156,"children":157},{},[158],{"type":18,"tag":117,"props":159,"children":161},{"alt":7,"src":160},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/2b51c3bab6f04e359906bb22b0e83d59.png",[],{"type":18,"tag":26,"props":163,"children":164},{},[165],{"type":18,"tag":75,"props":166,"children":167},{},[168],{"type":24,"value":169},"Result",{"type":18,"tag":26,"props":171,"children":172},{},[173],{"type":24,"value":174},"According to the experiment result, the GNN model with the scalable architecture paradigm is found, which has good scalability in distributed training. When the number of workers increases, the model can obtain the almost ideal linear acceleration. Therefore, the model is more suitable for graph data training in large scale.",{"type":18,"tag":26,"props":176,"children":177},{},[178],{"type":18,"tag":117,"props":179,"children":181},{"alt":7,"src":180},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/b7dd0b20045a4a2c8e1a0558f0ac4421.png",[],{"type":18,"tag":26,"props":183,"children":184},{},[185],{"type":24,"value":186},"In addition, the accuracy of the search model is slightly improved compared with that of the best baseline. The scalability of the search model is also better. As the model deepens, the number of aggregation times increases. Due to over-smoothing, when there are more than two layers, the accuracy of models such as GCN decreases, while the accuracy of the scalable search model increases within a certain range and remains unchanged.",{"type":18,"tag":26,"props":188,"children":189},{},[190],{"type":18,"tag":117,"props":191,"children":193},{"alt":7,"src":192},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/29/bf60600c9abc4342a1db468ad01ac394.png",[],{"type":18,"tag":26,"props":195,"children":196},{},[197],{"type":18,"tag":75,"props":198,"children":199},{},[200],{"type":24,"value":201},"Conclusion",{"type":18,"tag":26,"props":203,"children":204},{},[205,210],{"type":18,"tag":75,"props":206,"children":207},{},[208],{"type":24,"value":209},"Advantages:",{"type":24,"value":211}," Based on the GNN model architecture paradigm, this paper proposes a highly scalable paradigm by decoupling the inter-dependent message update processes such as neighborhood node feature aggregation and feature conversion. A model architecture search system is designed for the paradigm to search for a multi-objective scalable GNN architecture. Compared with existing manually designed architectures, it has higher accuracy with good distributed computing acceleration and model scalability.",{"type":18,"tag":26,"props":213,"children":214},{},[215,220],{"type":18,"tag":75,"props":216,"children":217},{},[218],{"type":24,"value":219},"Limits:",{"type":24,"value":221}," In this paper, the model search has been evaluated for 2000 times. Although the specific time consumption is not mentioned, the process consumes a lot of resources. In addition, lots of layers are added to the new model, which, however, brings just a slight improvement in accuracy compared with the SOTA models. Pre-processing including graph segmentation and the neighborhood feature aggregation without parameter training will reduce accuracy. The model design of the new paradigm does not come easy.",{"title":7,"searchDepth":223,"depth":223,"links":224},4,[],"markdown","content:technology-blogs:en:1839.md","content","technology-blogs/en/1839.md","technology-blogs/en/1839","md",1776506105289]