[{"data":1,"prerenderedAt":175},["ShallowReactive",2],{"content-query-0r5pEWBIqy":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"cover":11,"type":12,"category":13,"body":14,"_type":169,"_id":170,"_source":171,"_file":172,"_stem":173,"_extension":174},"/technology-blogs/en/1401","en",false,"","Introduction to the Dataflow and Spatial Computing Architectures","This article briefly introduces these tow architectures from the perspectives of hardware and software.","2022-03-16","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/03/31/3d01cf9fbc324c189d14b5253266e1c3.png","technology-blogs","Influencers",{"type":15,"children":16,"toc":166},"root",[17,25,31,39,44,51,56,61,68,73,80,85,92,97,102,109,114,123,128,133,141,146,151,156,161],{"type":18,"tag":19,"props":20,"children":22},"element","h1",{"id":21},"introduction-to-the-dataflow-and-spatial-computing-architectures",[23],{"type":24,"value":8},"text",{"type":18,"tag":26,"props":27,"children":28},"p",{},[29],{"type":24,"value":30},"In computer engineering, computer architecture is a set of rules and methods that describe the functionality, organization, and implementation of computer systems, of which the dataflow architecture and the spatial computing architecture have become a widely discussed research topic in system design. This article briefly introduces these two architectures from the perspectives of hardware and software.",{"type":18,"tag":26,"props":32,"children":33},{},[34],{"type":18,"tag":35,"props":36,"children":38},"img",{"alt":7,"src":37},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/03/31/3702daec0a6f40c69b6cee62638aa973.png",[],{"type":18,"tag":26,"props":40,"children":41},{},[42],{"type":24,"value":43},"Traditional CPU architectures adopt a control flow design, whereas the well-known field programmable gate array (FPGA) is created based on the dataflow architecture. Their main difference lies in instruction execution. The control flow architecture is similar to a time sequential architecture, because instruction fetching and execution follow a time sequence, whereas in a dataflow architecture instructions are executed in both temporal and spatial dimensions. Data is continuously transferred in the temporal dimension, and many small processing units are executed simultaneously in the spatial dimension.",{"type":18,"tag":26,"props":45,"children":46},{},[47],{"type":18,"tag":35,"props":48,"children":50},{"alt":7,"src":49},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/03/31/bf186d8c67cf4d3a94a2977b58492e1d.png",[],{"type":18,"tag":26,"props":52,"children":53},{},[54],{"type":24,"value":55},"(Control flow architecture) (Dataflow architecture)",{"type":18,"tag":26,"props":57,"children":58},{},[59],{"type":24,"value":60},"To illustrate, think of the control flow architecture as an omnipotent expert who is capable of manufacturing a car by himself, whereas the dataflow architecture is a team of assemblers who are each responsible for just one stage of the manufacturing process. In reality the control flow architecture is stacked to build a multi-core and multi-CPU form, whereas the dataflow architecture can be combined into higher dimensions.",{"type":18,"tag":26,"props":62,"children":63},{},[64],{"type":18,"tag":35,"props":65,"children":67},{"alt":7,"src":66},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/03/31/235ad2be8caa4d19a76981a3fa7dfd81.png",[],{"type":18,"tag":26,"props":69,"children":70},{},[71],{"type":24,"value":72},"In addition, from a programming viewpoint, the control flow architecture is well abstracted, as all code is converted into instructions by a compiler, and instructions are placed on a storage unit and executed by a processor. In contrast, the dataflow architecture requires a compiler or a related tool to compile code into hardware processing units, which is associated with FPGA and reconfigurable computing (see the following picture). Spatial computing even has a set of software programming standards called OpenSPL. Though the dataflow and spatial computing architectures have the following advantages, the classic versions are customized and cannot perform complicated operations.",{"type":18,"tag":26,"props":74,"children":75},{},[76],{"type":18,"tag":35,"props":77,"children":79},{"alt":7,"src":78},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/03/31/ba49d5bbfe674a568f8bcb685eb6bff4.png",[],{"type":18,"tag":26,"props":81,"children":82},{},[83],{"type":24,"value":84},"(Units for multiplication and addition operations are generated on the hardware.)",{"type":18,"tag":26,"props":86,"children":87},{},[88],{"type":18,"tag":35,"props":89,"children":91},{"alt":7,"src":90},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/03/31/91d664f2d28e4f1cb05c514382713269.png",[],{"type":18,"tag":26,"props":93,"children":94},{},[95],{"type":24,"value":96},"(Advantages)",{"type":18,"tag":26,"props":98,"children":99},{},[100],{"type":24,"value":101},"In actual practice, the control flow architecture and the dataflow/spatial computing architecture are combined for application development. The dedicated dataflow core is changed to a common core (CPU+SIMD). In this way, applications do not need to be compiled into hardware units and can run based on the control flow architecture. However, many common cores are stacked, and the deployment of applications at the top layer follows the pipeline structure of the dataflow architecture. Take Tenstorrent as an example: A large number of Tensix cores are stacked on each chip. Tensix cores can be regarded as dataflow cores. However, Tensix cores are essentially general computing units, and each Tensix core contains computing units such as scalars, vectors, and matrices.",{"type":18,"tag":26,"props":103,"children":104},{},[105],{"type":18,"tag":35,"props":106,"children":108},{"alt":7,"src":107},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/03/31/ecc3fe87d8cb4cfda27d1abaf7770645.png",[],{"type":18,"tag":26,"props":110,"children":111},{},[112],{"type":24,"value":113},"From a software perspective, any cluster system can be regarded as a dataflow system, and each cluster node can be regarded as a dataflow core. A cluster application can be developed in the form of a pipeline, for example, the stream processing system of big data. To ensure efficient running of cluster systems, the key problem that needs to be tackled is scheduling, which is divided into two phases: task orchestration and optimization, and task scheduling and execution.",{"type":18,"tag":26,"props":115,"children":116},{},[117],{"type":18,"tag":118,"props":119,"children":120},"strong",{},[121],{"type":24,"value":122},"Task orchestration and optimization",{"type":18,"tag":26,"props":124,"children":125},{},[126],{"type":24,"value":127},"Static orchestration and optimization: Before a task is executed, compilation and optimization methods are used for global orchestration based on the cost models or rules. For example, data SQL is usually optimized based on a cost model, whereas static graphs of an AI framework are more suited to rule-based optimization.",{"type":18,"tag":26,"props":129,"children":130},{},[131],{"type":24,"value":132},"Dynamic orchestration and optimization: No orchestration is done in advance. A task is orchestrated while it is being executed. Typical examples are dynamic diagrams of an AI framework and traditional HPC program modes (MPI/OpenMP).",{"type":18,"tag":26,"props":134,"children":135},{},[136],{"type":18,"tag":118,"props":137,"children":138},{},[139],{"type":24,"value":140},"Task scheduling and execution",{"type":18,"tag":26,"props":142,"children":143},{},[144],{"type":24,"value":145},"Centralized scheduling: The master node dynamically assigns a node for orchestrated task execution through various scheduling policies based on the number of computing resources, data locality, etc. Spark is a typical framework that uses the centralized scheduling policy.",{"type":18,"tag":26,"props":147,"children":148},{},[149],{"type":24,"value":150},"Decentralized scheduling: No centralized master node is used to schedule tasks. Simply put, tasks are mapped to resources before being executed. Each node independently schedules and executes tasks, and tasks interact with each other through messages or shared memories. The traditional MPI programming model is a typical decentralized scheduling model, and most AI frameworks also adopt this scheduling policy.",{"type":18,"tag":26,"props":152,"children":153},{},[154],{"type":24,"value":155},"Variant (a combination of centralized and decentralized scheduling): Tree-like recursive scheduling, for example, the Ray framework of the RISELab at UC Berkeley.",{"type":18,"tag":26,"props":157,"children":158},{},[159],{"type":24,"value":160},"Compared with conventional IT or cloud-based cluster systems, a dataflow hardware system is more tightly coupled. Multiple cores in a die use shared memories or inter-core IPC, and high-speed connection is applied among cores, chips, and nodes. All these require very fine task division.",{"type":18,"tag":26,"props":162,"children":163},{},[164],{"type":24,"value":165},"To better utilize the performance of current dataflow hardware architectures, software needs to plan an appropriate dataflow diagram. Specifically, static orchestration and optimization and decentralized scheduling are required for graph execution through events and pipelines formed by execution units. But static programming faces two challenges: the size of the program cannot be too large and there cannot be too many dynamic structures. To ensure application flexibility, sometimes we need to take dynamic planning into account. However, how to handle architectures such as Tenstorrent remains a challenging task.",{"title":7,"searchDepth":167,"depth":167,"links":168},4,[],"markdown","content:technology-blogs:en:1401.md","content","technology-blogs/en/1401.md","technology-blogs/en/1401","md",1776506102757]