Zidong Taichu 2.0: A Great Leap in Exploring General AI
Zidong Taichu 2.0: A Great Leap in Exploring General AI
At the 2023 Artificial Intelligence Framework Ecosystem Summit, Xu Bo, director of the Institute of Automation, Chinese Academy of Sciences (CASIA), announced the launch of the full-modality foundation model—Zidong Taichu 2.0, and displayed its powerful functions, such as music understanding and generation, 3D scene navigation, signal understanding, and multi-modal dialogue.

(Xu Bo at the 2023 Artificial Intelligence Framework Ecosystem Summit)
Zidong Taichu 2.0 is an upgrade from Zidong Taichu 1.0, a multimodal foundation model containing hundreds of billions of parameters. In addition to audio, image, and text modalities, modal data such as videos, signals, and 3D point clouds is added to the new version, making a breakthrough on key unified multimodal association technologies such as cognitive enhancement, due to its superb capabilities of comprehension, generation, and association.
Cognitive Capability Is a Key for Full Modality
Since 2019, the Chinese Academy of Sciences (CAS) has started the development on multimodal foundation models based on the research and application of single-modal foundation models in the audio, text, and image fields. In 2021, CAS officially released Zidong Taichu 1.0, advancing AI from having a single specialization and capability to processing multiple specializations and capabilities.
According to Xu Bo, multimodal capabilities are essential for human learning and interaction, to achieve a higher level of intelligence. As a result, Zidong Taichu has always prioritized the multimodal technical approach at the very beginning.
"As Zidong Taichu 1.0 is continuously applied, many new requirements are arising. For example, in industrial intelligence, many parameters such as temperature, humidity, pressure, and liquid level measurements need to be processed. In medical scenarios, structured data from medical examinations and heterogeneous medical image data need to be processed. By analyzing both structured and unstructured data, it becomes clear that only when data is cognitively collected and analyzed can we truly progress towards an intelligent society. This allows us to perceive and transform the world in broader and more advanced ways." Xu Bo stated.
Therefore, Zidong Taichu 2.0 is fully upgraded by leveraging the cognitive capability. From a technical architecture standpoint, it allows for full-modality open access to both structured and unstructured data. It also achieves breakthroughs in multimodal grouping cognitive encoding/decoding technologies and cognitive enhancement of multi-modal association technologies, greatly improving its multimodal cognitive capability.
Resources Integrating to Explore the Industrialization of General AI
At the conference, Xu Bo demonstrated Zidong Taichu's capabilities of illustrating the Beethoven's story through Moonlight Sonata, accurately positioning in three-dimensional scenarios, and scenario analysis based on the combination of images and voices.
Compared with Zidong Taichu 1.0, the 2.0 version improves decision-making and judgment capabilities and achieves a leap from perception, cognition, to decision-making. This means that in actual application scenarios, it will be able to create greater value for the industry.
Xu Bo stated that this full-modality foundation model has made numerous pioneering explorations in areas such as neurosurgery navigation, legal consultation, medical multimodal diagnosis, and traffic violation image reading. Especially in medical scenarios, the neurosurgery robot MicroNeuro equipped with Zidong Taichu is able to integrate multimodal information such as visual and tactile data in real time during surgery, helping doctors make real-time inference and judgment about surgical situations. In addition, in cooperation with Peking Union Medical College Hospital (PUMCH), the strong logical inference ability of Zidong Taichu is utilized to assist the diagnosis and treatment of rare human diseases.
It is also noteworthy that the Zidong Taichu foundation model is supported by algorithms developed by the CASIA, Ascend AI hardware, MindSpore AI framework, and computing power provided by Wuhan AI Computing Center. Xu Bo said: "We've been developing an open service platform based on the technical research of foundation models. Our goal is to integrate resources from industry, education, and research to build multimodal AI industry applications and explore the path towards general AI industrialization."
In the near future, CAS plans to continue its exploration of integrating brain-like and game-like intelligence using the full-modality Zidong Taichu foundation model. The goal is to achieve self-evolving general AI and explore its value in various fields, contributing to the rapid growth of China’s digital economy.