[{"data":1,"prerenderedAt":284},["ShallowReactive",2],{"content-query-wKAyEdnDW4":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"cover":11,"type":12,"body":13,"_type":278,"_id":279,"_source":280,"_file":281,"_stem":282,"_extension":283},"/technology-blogs/en/3065","en",false,"","MindSpore-based FLAG3D Evaluation - Natural Language-guided 3D Fitness Activity Dataset","Authors: Liu Aoyang, Liu Jinpeng, Jiang Haonan","2024-02-29","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/04/19/05a903e8fb674096a97e479b102ea1e3.png","technology-blogs",{"type":14,"children":15,"toc":275},"root",[16,24,30,35,40,45,50,62,67,76,89,94,103,111,116,121,129,137,142,147,155,163,168,176,181,186,191,196,201,206,213,218,223,231,239,244,249,257,265,270],{"type":17,"tag":18,"props":19,"children":21},"element","h1",{"id":20},"mindspore-based-flag3d-evaluation-natural-language-guided-3d-fitness-activity-dataset",[22],{"type":23,"value":8},"text",{"type":17,"tag":25,"props":26,"children":27},"p",{},[28],{"type":23,"value":29},"Paper Title",{"type":17,"tag":25,"props":31,"children":32},{},[33],{"type":23,"value":34},"FLAG3D: A 3D Fitness Activity Dataset with Language Instruction",{"type":17,"tag":25,"props":36,"children":37},{},[38],{"type":23,"value":39},"Source",{"type":17,"tag":25,"props":41,"children":42},{},[43],{"type":23,"value":44},"CVPR 2023",{"type":17,"tag":25,"props":46,"children":47},{},[48],{"type":23,"value":49},"URL",{"type":17,"tag":25,"props":51,"children":52},{},[53],{"type":17,"tag":54,"props":55,"children":59},"a",{"href":56,"rel":57},"https://openaccess.thecvf.com/content/CVPR2023/html/Tang%5C_FLAG3D%5C_A%5C_3D%5C_Fitness%5C_Activity%5C_Dataset%5C_With%5C_Language%5C_Instruction%5C_CVPR%5C_2023%5C_paper.html",[58],"nofollow",[60],{"type":23,"value":61},"https://openaccess.thecvf.com/content/CVPR2023/html/Tang\\_FLAG3D\\_A\\_3D\\_Fitness\\_Activity\\_Dataset\\_With\\_Language\\_Instruction\\_CVPR\\_2023\\_paper.html",{"type":17,"tag":25,"props":63,"children":64},{},[65],{"type":23,"value":66},"Dataset and Code",{"type":17,"tag":25,"props":68,"children":69},{},[70],{"type":17,"tag":54,"props":71,"children":74},{"href":72,"rel":73},"https://andytang15.github.io/FLAG3D",[58],[75],{"type":23,"value":72},{"type":17,"tag":25,"props":77,"children":78},{},[79,81,87],{"type":23,"value":80},"As an open-source AI framework, MindSpore supports ultra-large-scale AI pre-training and brings excellent experience of device-edge-cloud synergy, simplified development, ultimate performance, and security and reliability for researchers and developers. Since it was open-sourced on March 28, 2020, MindSpore has been downloaded for over 6.57 million times. It has also been the subject of thousands of papers presented at premier AI conferences. Furthermore, it has a large community of developers and has been introduced in over 290 top universities and 5000 commercial applications. Being widely used in scenarios such as AI computing centers, finance, smart manufacturing, cloud, wireless, datacom, energy, \"1+8+",{"type":17,"tag":82,"props":83,"children":84},"em",{},[85],{"type":23,"value":86},"N",{"type":23,"value":88},"\" consumers, and smart automobiles, MindSpore has emerged as the leading open-source software on Gitee. Here in this open source community, you are welcome to make contributions such as development kits, idea pooling for modeling, industry innovations and applications, algorithm innovations, academic collaborations, AI book collaborations, and your own application cases in the cloud, device, edge, security fields, and more.",{"type":17,"tag":25,"props":90,"children":91},{},[92],{"type":23,"value":93},"Thanks to the support from scientific, industry and academic circles, MindSpore-based papers account for 7% of all papers based on all AI frameworks as of 2023, ranking No. 2 in the world for two consecutive years. The MindSpore community supports analysis on top-level conference papers and promotes original AI achievements. In this blog, I'd like to share the paper of the team led by Prof. Tang Yansong, Tsinghua Shenzhen International Graduate School.",{"type":17,"tag":25,"props":95,"children":96},{},[97],{"type":17,"tag":98,"props":99,"children":100},"strong",{},[101],{"type":23,"value":102},"01",{"type":17,"tag":25,"props":104,"children":105},{},[106],{"type":17,"tag":98,"props":107,"children":108},{},[109],{"type":23,"value":110},"Background",{"type":17,"tag":25,"props":112,"children":113},{},[114],{"type":23,"value":115},"With the development of science and technology, people have higher requirements on health. Fitness has become increasingly popular. Against this background, it is particularly important to build an intelligent system that can sense, understand, and analyze fitness actions. The core technologies of the system are human action recognition, human mesh recovery, and human action generation. Human action recognition refers to recognizing actions of a target person in a video stream sequence. Human mesh recovery means to model a person in a video in three dimensions. Human action generation indicates generating a desired action sequence through a generative model. These technologies are widely used for video Q&A, virtual human, metaverse, VR, etc. A comprehensive and future-oriented intelligent fitness system is under construction.",{"type":17,"tag":25,"props":117,"children":118},{},[119],{"type":23,"value":120},"However, these fields are in lack of fitness action–related datasets, as well as datasets oriented to complex actions requiring high accuracy in various generalization scenarios. To this end, researchers proposed a new dataset FLAG3D, which contains 180,000 action sequences with as many as 60 actions. In addition, the dataset provides rich annotations of the languages and three-dimensional motion capture.",{"type":17,"tag":25,"props":122,"children":123},{},[124],{"type":17,"tag":98,"props":125,"children":126},{},[127],{"type":23,"value":128},"02",{"type":17,"tag":25,"props":130,"children":131},{},[132],{"type":17,"tag":98,"props":133,"children":134},{},[135],{"type":23,"value":136},"Team Introduction",{"type":17,"tag":25,"props":138,"children":139},{},[140],{"type":23,"value":141},"Dr. Tang Yansong, the first author of this paper, is now an assistant professor from Tsinghua Shenzhen International Graduate School. He is mainly engaged in researches of AI and computer vision, and has published more than 30 papers in international authoritative journals or conferences, including more than 20 papers in the IEEE conference such as TPAMI, and CCF A conference such as CVPR as the first/corresponding author. According to Google Scholar, his papers have been cited more than 1800 times.",{"type":17,"tag":25,"props":143,"children":144},{},[145],{"type":23,"value":146},"Li Xiu, the lead corresponding author, is a professor from Tsinghua Shenzhen International Graduate School whose researches focus on the intelligent system, data mining, and pattern recognition. He has published more than 100 academic papers in important academic journals or conferences. His papers have garnered more than 500 citations in Web of Science Core Collection, more than 7000 citations in Google scholar, and have obtained 7 invention patents and 5 software copyrights in China.",{"type":17,"tag":25,"props":148,"children":149},{},[150],{"type":17,"tag":98,"props":151,"children":152},{},[153],{"type":23,"value":154},"03",{"type":17,"tag":25,"props":156,"children":157},{},[158],{"type":17,"tag":98,"props":159,"children":160},{},[161],{"type":23,"value":162},"Introduction to the Paper",{"type":17,"tag":25,"props":164,"children":165},{},[166],{"type":23,"value":167},"With the popularization of workout in the gym, fitness activity analysis has become a new research topic of computer vision. Various new tasks and algorithms have been proposed, while users' demands for data resources with high quality, fine-grained tags, and diversified sources keep increasing. In this paper, the authors propose FLAG3D, a large-scale 3D fitness activity dataset with language instruction, which contains 180,000 sequences of 60 categories.",{"type":17,"tag":25,"props":169,"children":170},{},[171],{"type":17,"tag":172,"props":173,"children":175},"img",{"alt":7,"src":174},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/04/19/696350a6cd794a8796f7abbb621c5988.png",[],{"type":17,"tag":25,"props":177,"children":178},{},[179],{"type":23,"value":180},"Figure 1 FLAG3D dataset overview",{"type":17,"tag":25,"props":182,"children":183},{},[184],{"type":23,"value":185},"FLAG3D features the following advantages:",{"type":17,"tag":25,"props":187,"children":188},{},[189],{"type":23,"value":190},"(1) Accurate and dense three-dimensional human poses captured by the advanced MoCap system can handle complex activities and large movements.",{"type":17,"tag":25,"props":192,"children":193},{},[194],{"type":23,"value":195},"(2) Detailed and professional language instruction can describe the steps and poses to complete a specific activity.",{"type":17,"tag":25,"props":197,"children":198},{},[199],{"type":23,"value":200},"(3) The MoCap system, rendering software, and low-cost smartphones in natural environments can provide various video resources.",{"type":17,"tag":25,"props":202,"children":203},{},[204],{"type":23,"value":205},"Through extensive experiments and in-depth analysis, FLAG3D has demonstrated its research value in different fields, including cross-domain human action recognition, dynamic human mesh recovery, and language-guided human action generation.",{"type":17,"tag":25,"props":207,"children":208},{},[209],{"type":17,"tag":172,"props":210,"children":212},{"alt":7,"src":211},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/04/19/800fd7ebaa3344dea7b707023a839c38.png",[],{"type":17,"tag":25,"props":214,"children":215},{},[216],{"type":23,"value":217},"Table 1 Comparison between FLAG3D and other datasets",{"type":17,"tag":25,"props":219,"children":220},{},[221],{"type":23,"value":222},"In this blog, the implementation on MindSpore can be divided into three parts: dataset sampling constructed by GeneratorDataset, modeling constructed by nn and ops under Cell, and training supported by value_and_grad. In the MindSpore framework, each API is clearly defined and easy to call. The optimizer and solution computational graph in the training phase adopt an efficient design and deliver high usability. In addition, documents on the official website are clearly described and provide sufficient examples for reference.",{"type":17,"tag":25,"props":224,"children":225},{},[226],{"type":17,"tag":98,"props":227,"children":228},{},[229],{"type":23,"value":230},"04",{"type":17,"tag":25,"props":232,"children":233},{},[234],{"type":17,"tag":98,"props":235,"children":236},{},[237],{"type":23,"value":238},"Experimental Results",{"type":17,"tag":25,"props":240,"children":241},{},[242],{"type":23,"value":243},"In the MindSpore framework, the authors use mainstream skeleton-based action recognition algorithms, such as 2s-AGCN and PoseC3D, to evaluate the FLAG3D dataset. The accuracy of 2s-AGCN is 81.5% (out-domain) and 98.6% (in-domain) on FLAG3D, while the accuracy of PoseC3D on FLAG3D (out-domain) is 79.9%. Results show that traditional methods can achieve good results in the in-domain scenario, but are still insufficient in the out-domain experiment setting.",{"type":17,"tag":25,"props":245,"children":246},{},[247],{"type":23,"value":248},"Different from other frameworks, MindSpore generates a reverse graph only when grad is called. Therefore, manual setting is not required during inference. In addition, MindSpore supports more parameters and operations with its API design and dataset construction, offering higher flexibility. These features make MindSpore a powerful and convenient deep learning tool.",{"type":17,"tag":25,"props":250,"children":251},{},[252],{"type":17,"tag":98,"props":253,"children":254},{},[255],{"type":23,"value":256},"05",{"type":17,"tag":25,"props":258,"children":259},{},[260],{"type":17,"tag":98,"props":261,"children":262},{},[263],{"type":23,"value":264},"Summary",{"type":17,"tag":25,"props":266,"children":267},{},[268],{"type":23,"value":269},"In this paper, the authors propose a three-dimensional fitness action dataset FLAG3D, which outperforms previous datasets in terms of skeleton accuracy, language description granularity, and source richness. Qualitative and quantitative experimental results show that FLAG3D poses new challenges to cross-domain human action recognition, dynamic human mesh recovery, and language-guided human action generation.",{"type":17,"tag":25,"props":271,"children":272},{},[273],{"type":23,"value":274},"By focusing on flexibility and usability in design, MindSpore has emerged as a powerful and user-friendly deep learning tool for developers. In addition, the MindSpore community is developing rapidly with many versions being released, and each version features easy-to-use APIs. As a deep learning framework, MindSpore is poised to assume a more significant role in the future. With the continuous development of the framework and its community, more innovations and application cases are on the horizon. MindSpore hopes to grow with developers and users to build an active and friendly community, where learning is a shared experience through diverse documents and examples.",{"title":7,"searchDepth":276,"depth":276,"links":277},4,[],"markdown","content:technology-blogs:en:3065.md","content","technology-blogs/en/3065.md","technology-blogs/en/3065","md",1776506110219]