From Perception to Cognition: "Zidong.Taichu" Keeps Exploring New Possibilities of General AI
From Perception to Cognition: "Zidong.Taichu" Keeps Exploring New Possibilities of General AI
At the Huawei Ascend AI Developer Summit on May 6, Wang Jinqiao, the executive deputy director of the Zidong.Taichu Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences (CASIA) and president of the Wuhan AI Research, said that leveraging Huawei's full-stack Ascend AI software and hardware platform, CASIA and the Wuhan AI Research jointly built the Zidong.Taichu 2.0, a full-modal foundation model. The model can implement unified representation and learning of different modes, such as text, image, speech, video, 3D point cloud, and sensor signals, ushering in the general AI era.
Advanced Layout with Multimodal Technologies
Since infanthood, we have developed our cognition with multimodal information, such as real-world images, sounds, and words. Breaking through modal restrictions between images, texts, and sounds, the general foundation model plays a crucial role in enabling AI to integrate into the real world.
Since 2019, CASIA has embarked on its exploration of multimodal foundation model research based on applications of single-modal foundation models on speech, text, and image. In July 2021, the Institute officially released Zidong.Taichu, the world's first multimodal foundation model with hundreds of billions of parameters, fueling the development of general AI with multimodal technologies.
Different from many text-based language models, Zidong.Taichu takes multimodal technologies to the core from the beginning of R&D, and applies data of richer types, including images, speeches, and texts for cross-modal unified representation and learning. In this way, unified representation and mutual generation of the three modes (image, text, and speech) are achieved, bringing the understanding and generative capabilities of AI foundation models closer to those of humans. This move lays an innovative basis for multimodal AI industry applications, and is the key to realizing general AI.
From Perception to Cognition, a New Level of Digital IoT
The world has entered the digital IoT era. In the application of Zidong.Taichu 1.0, we found that in addition to massive speech, image, and text information on the Internet, massive IoT data from multiple sensors, 3D point clouds, and videos should also be processed to improve productivity. Conforming to the new trends in the era, CASIA further promotes the technologies and applications of Zidong.Taichu by studying the system and architecture of full-modal foundation models and basic algorithms.
Beside texts, images, audios, and videos, Zidong.Taichu 2.0 can process richer types of data, such as 3D data, videos, and sensor signals. It bridges the gaps between perception, cognition, and decision-making by optimizing converged cognition of speeches, videos, and texts and common sense computing, shifting AI from perception to cognition with more powerful general capabilities.
Building a Full-Stack Base for General AI in China
A full-stack platform in China can accelerate innovative development of science and technology in the AI field. With the support of CASIA-developed algorithms, Ascend AI platform, and the computing power of the Wuhan AI Computing Center, Zidong.Taichu is designed to be controllable, reliable, and available at the beginning, greatly promoting the development of full-stack basic software and hardware in China.
By now, the multimodal AI alliance established under the leadership of CASIA has attracted nearly 70 members from all walks of life. Centering on the technology and application of multimodal foundation models, Zidong.Taichu has demonstrated strong potential in dozens of scenarios, such as sign language teaching, legal consultation, transportation, medical robotics, and medical image interpretation. At the Huawei Ascend AI Developer Summit, the Ascend- and MindSpore-based image-text-speech multimodal model of the version 3.8B, and the Zidong.Taichu foundation model service platform is opened up.
As the Internet evolves from limited modes to full-modal IoT, CASIA will continue to advance the Zidong.Taichu foundation model through independent innovation across the entire chain, from basic theories to key technologies and application ecosystems to build a proprietary base for general AI. By focusing on building an open ecosystem in the multimodal industry with independent and controllable basic software and hardware, CASIA hopes to find new possibilities of general AI.