MindSpore Technical Forum: Will Foundation Models Lead the Future of AI?
MindSpore Technical Forum: Will Foundation Models Lead the Future of AI?
Foundation models are models that have been trained on broad data (generally using self-supervision at scale) and adapt to downstream services. Developed on standard ideas in transfer learning and recent advances in deep learning and computer systems, these models demonstrate quality capabilities that can substantially improve performance for countless downstream applications and processes. Given this potential, an increasing number of AI systems are expected to be built and released on these foundation models.
On 9th April, MindSpore and PaperWeekly jointly held the first MindSpore Technical Forum, which saw distinguished guests and esteemed scholars share their insights into the current and future state of AI foundation models. The following is a short review of speeches given at the forum.
Mr. Zhou Bin, General Manager of the MindSpore R&D Department, explained how achievements of cutting-edge MindSpore helped in AI development, pointing to examples of the PCL-L, Huawei Cloud NLP, and Zidong Taichu models, Peng Cheng-Shennong bioinformatics platform, and Wuhan.LuojiaNet AI framework. He also expressed his hopes for the future development of MindSpore-based foundation models.
Mr. Wang Jinqiao, Professor from the Institute of Automation, Chinese Academy of Sciences (CAS), introduced the key technologies used in the Zidong Taichu model. The model uses encoders to map different vision, text, and speech modes to the unified semantic space, and then learns the semantic association and unified representation of the modes through the multi-head self-attention mechanism. This facilitates and balances model cross-modal understanding and enactive cognition. With this, a multi-layer and self-supervised learning framework is created to process diversified downstream tasks.
Mr. Zhang Mi, Research Associate from Wuhan University introduced a machine learning framework for intelligent remote sensing. Mr. Zhang explained how intelligent processing of space information on remote sensing images requires both a large-scale library of remote sensing sample images and a dedicated machine learning framework. For this, Wuhan University released Wuhan.LuojiaSet and Wuhan.LuojiaNet. The former is the world's largest remote sensing dataset that contains more than 5 million samples; the latter is the world's first AI framework dedicated to remote sensing images and supports flexible creation of scale channels, adaptive selection of data channels, and multi-level joint optimization.
Mr. Chen Jie, Associate Professor from Peking University, introduced the Peng Cheng-Shennong bioinformatics platform and shared his research on data-intensive precision computing in life sciences. The platform is designed to use powerful AI for biomedical exploration based on large-scale computing clusters of Peng Cheng Cloud Brain II and pre-trained NLP foundation models. Typically, clinical research and experiment into new drugs takes six to nine years, but using new AI algorithms to mine data, the platform can provide accurate results in a much shorter time, easily accelerating R&D and facilitating cutting-edge research in precision medicine.
Mr. Peng Zhiliang, Doctor at CAS, presented the Conformer model. In convolution neural networks (CNNs), it is easy for the convolution operator to extract local features but hard to capture global features. Likewise, Vision Transformer (ViT) uses the self-attention module to capture long-distance feature dependencies but in doing so, damages local features. To this regard, Conformer combines convolution calculation with self-attention mechanism, providing superb generalization capabilities on downstream tasks.
Mr. Su Teng, an expert in distributed and parallel computing from Huawei, introduced best practices for MindSpore-based foundation models built for the AI industry. MindSpore is a leading AI framework that excels in parallel dimensions, model partition, and model parameter quantity. Models trained on MindSpore use an AI complier to implement multi-dimensional hybrid parallelism and support hybrid parallel algorithms from seven-dimensions, such as data parallelism and data slicing preprocessing. These models slash the amount of concurrent code needed for development by 80%, and time needed for system tuning by 60%, allowing hundreds of billions of model parameters to be trained on a single server. The Zidong Taichu model and the Wuhan.LuojiaNet framework both run on MindSpore, and this project's achievements show the potential for AI foundation models to fuel the development of the AI industry.
Foundation models are an important and emerging trend. Despite recent best practices, these models still face great challenges, such as system emergent behavior and homogenization. We hope to hear from leading voices on foundation models for future cooperation. To learn more, visit MindSpore official website, Gitee and GitHub.