[{"data":1,"prerenderedAt":348},["ShallowReactive",2],{"content-query-u1XIB6fy1W":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"cover":11,"type":12,"body":13,"_type":342,"_id":343,"_source":344,"_file":345,"_stem":346,"_extension":347},"/technology-blogs/en/2922","en",false,"","CycMuNet+: Spatial-Temporal Video Super-Resolution Based on MindSpore","CycMuNet+ show significant superiority in spatial, temporal, and spatial-temporal video super-resolution tasks.","2023-10-20","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/12/29/eeac357249594722995b0b867b7280a2.png","technology-blogs",{"type":14,"children":15,"toc":339},"root",[16,24,30,39,44,52,57,65,76,84,93,106,111,116,124,129,137,142,147,152,160,165,170,175,180,185,193,200,205,210,217,222,227,235,240,245,252,257,262,269,274,279,286,291,296,301,306,311,319,324,329,334],{"type":17,"tag":18,"props":19,"children":21},"element","h1",{"id":20},"cycmunet-spatial-temporal-video-super-resolution-based-on-mindspore",[22],{"type":23,"value":8},"text",{"type":17,"tag":25,"props":26,"children":27},"p",{},[28],{"type":23,"value":29},"Author: Li Ruifeng | Source: Zhihu",{"type":17,"tag":25,"props":31,"children":32},{},[33],{"type":17,"tag":34,"props":35,"children":36},"strong",{},[37],{"type":23,"value":38},"Paper Title",{"type":17,"tag":25,"props":40,"children":41},{},[42],{"type":23,"value":43},"CycMuNet+: Cycle-Projected Mutual Learning for Spatial-Temporal Video Super-Resolution",{"type":17,"tag":25,"props":45,"children":46},{},[47],{"type":17,"tag":34,"props":48,"children":49},{},[50],{"type":23,"value":51},"Source of the Paper:",{"type":17,"tag":25,"props":53,"children":54},{},[55],{"type":23,"value":56},"CVPR2022/TPAMI",{"type":17,"tag":25,"props":58,"children":59},{},[60],{"type":17,"tag":34,"props":61,"children":62},{},[63],{"type":23,"value":64},"Paper URL",{"type":17,"tag":25,"props":66,"children":67},{},[68],{"type":17,"tag":69,"props":70,"children":74},"a",{"href":71,"rel":72},"https://openaccess.thecvf.com/content/CVPR2022/papers/Hu_Spatial-Temporal_Space_Hand-in-Hand_Spatial-Temporal_Video_Super-Resolution_via_Cycle-Projected_Mutual_Learning_CVPR_2022_paper.pdf",[73],"nofollow",[75],{"type":23,"value":71},{"type":17,"tag":25,"props":77,"children":78},{},[79],{"type":17,"tag":34,"props":80,"children":81},{},[82],{"type":23,"value":83},"Code URL",{"type":17,"tag":25,"props":85,"children":86},{},[87],{"type":17,"tag":69,"props":88,"children":91},{"href":89,"rel":90},"https://github.com/tongyuantongyu/cycmunet/tree/main/mindspore",[73],[92],{"type":23,"value":89},{"type":17,"tag":25,"props":94,"children":95},{},[96,98,104],{"type":23,"value":97},"As an open-source AI framework, MindSpore offers a simplified, secure, reliable, and high-performance development process for device-edge-cloud collaboration and ultra-large-scale AI pre-training for the industry-university-research ecosystem. Since it was open sourced on March 28, 2020, it has garnered over 5 million downloads and has been the subject of hundreds of papers presented at premier AI conferences. Furthermore, MindSpore has a large community of developers and has been introduced in over 100 universities and 5000 commercial apps. Being widely used in scenarios such as AI computing centers, finance, smart manufacturing, cloud, wireless, datacom, energy, \"1+8+",{"type":17,"tag":99,"props":100,"children":101},"em",{},[102],{"type":23,"value":103},"N",{"type":23,"value":105},"\" consumer, and smart automobiles, MindSpore has emerged as one of the leading open-source software on Gitee. The MindSpore community extends a warm welcome to all who wish to contribute to open-source development kits, models, industrial applications, algorithm innovations, academic collaborations, AI-themed book writing, and application cases across the cloud, device, edge, and security.",{"type":17,"tag":25,"props":107,"children":108},{},[109],{"type":23,"value":110},"Thanks to the support from scientific, industry, and academic circles, MindSpore-based papers account for 7% of all papers about AI frameworks in 2023, ranking No. 2 globally for two consecutive years. The MindSpore community is thrilled to share and interpret top-level conference papers and is looking forward to collaborating with experts from industries, academia, and research institutions, so as to yield proprietary AI outcomes and innovate AI applications. In this blog, I'd like to share the paper of the team led by Prof. Wang Zheng, School of Computer at Wuhan University.",{"type":17,"tag":25,"props":112,"children":113},{},[114],{"type":23,"value":115},"MindSpore aims to achieve three goals: easy development, efficient execution, and all-scenario coverage. The development of MindSpore has been characterized by rapid improvements with successive iterations, with its API design being more complete, reasonable, and powerful. To augment its convenience and power, several kits based on MindSpore have been developed. One such example is MindSpore Insight, which can present model architectures in graphs and dynamically monitor the changes of indicators and parameters during model execution, thereby simplifying the development process.",{"type":17,"tag":25,"props":117,"children":118},{},[119],{"type":17,"tag":34,"props":120,"children":121},{},[122],{"type":23,"value":123},"01 Research Background",{"type":17,"tag":25,"props":125,"children":126},{},[127],{"type":23,"value":128},"With the rise of short videos and live streaming, people are spending more time on online videos than on social networks and demanding higher-quality videos. As two major factors of video quality, the frame rate and resolution of a video have a significant influence on the subjective perception of human eyes. Videos with high frame rates (HFR) and resolutions, on the other hand, place more demands on network bandwidth and device capabilities. Most video data collected in the actual world has a poor frame rate and quality. Therefore, the industry is in search of a framework that can increase the frame rate and resolution of a given video. The academia is also researching ways to properly integrate spatial and temporal video super-resolution technologies.",{"type":17,"tag":25,"props":130,"children":131},{},[132],{"type":17,"tag":133,"props":134,"children":136},"img",{"alt":7,"src":135},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/12/29/11a2e589756340ee9b72eac5d029ad5e.png",[],{"type":17,"tag":25,"props":138,"children":139},{},[140],{"type":23,"value":141},"Figure 1: Different schemes for spatial-temporal video super-resolution (ST-VSR)",{"type":17,"tag":25,"props":143,"children":144},{},[145],{"type":23,"value":146},"Current advanced ST-VSR methods can be roughly divided into two categories: two-stage and one-stage-based methods. The two-stage-based methods decompose ST-VSR into two sequential sub-tasks: spatial video super-resolution (S-VSR) and temporal video super-resolution (T-VSR). However, the two-stage-based methods fail to mutually explore the coupled correlations between S-VSR and T-VSR. The one-phase-based methods integrate S-VSR and T-VSR into a unified framework, but they only focus on the unilateral relationship between the two tasks, consequently leading to obvious artifacts in reconstructed results.",{"type":17,"tag":25,"props":148,"children":149},{},[150],{"type":23,"value":151},"To solve the preceding problems, we propose a one-stage-based cycle-projected mutual learning network (CycMuNet) for mutual learning of T-VSR and S-VSR tasks and thorough utilization of spatial and temporal information. In the proposed network, we have designed up-and-down projection units, which effectively utilize the temporal correlation to reconstruct the spatial details and refine temporal prediction via updated spatial information. Through iterations, temporal and spatial information can be fully utilized mutually. The results on benchmark datasets show that the proposed network achieves impressive performance in S-VSR, T-VSR and ST-VSR tasks.",{"type":17,"tag":25,"props":153,"children":154},{},[155],{"type":17,"tag":34,"props":156,"children":157},{},[158],{"type":23,"value":159},"02 Team Introduction",{"type":17,"tag":25,"props":161,"children":162},{},[163],{"type":23,"value":164},"Hu Mengshun: third-year PhD student at Wuhan University. Hu has published works on TPAMI, TCSVT, CVPR, AAAI, ACM MM, and other high-level academic journals, covering ST-VSR and video frame interpolation.",{"type":17,"tag":25,"props":166,"children":167},{},[168],{"type":23,"value":169},"Jiang Kui: associate professor at Harbin University of Technology. Jiang has published more than 30 papers in top journals and conferences, such as TPAMI, TIP, CVPR, ICCV, AAAI and ACM MM, covering image and video processing.",{"type":17,"tag":25,"props":171,"children":172},{},[173],{"type":23,"value":174},"Wang Zheng: professor at the National Engineering Research Center for Multimedia Software, Wuhan University. Wang has published more than 70 papers in top journals and conferences, such as TPAMI, TIP, CVPR, ICCV, AAAI and ACM MM, won the PCM 2014 Best Paper, nominated the ICME 2021 Best Paper, and is listed in Stanford's top 2% most highly cited scientists 2022. Wang's main research directions are multimedia content analysis and social security governance.",{"type":17,"tag":25,"props":176,"children":177},{},[178],{"type":23,"value":179},"Bai Xiang: professor and doctoral tutor at School of Artificial Intelligence an Automation, Huazhong University of Science and Technology.",{"type":17,"tag":25,"props":181,"children":182},{},[183],{"type":23,"value":184},"Hu Ruimin: dean of School of Cyber Engineering, Xidian University.",{"type":17,"tag":25,"props":186,"children":187},{},[188],{"type":17,"tag":34,"props":189,"children":190},{},[191],{"type":23,"value":192},"03 Introduction to the Paper",{"type":17,"tag":25,"props":194,"children":195},{},[196],{"type":17,"tag":133,"props":197,"children":199},{"alt":7,"src":198},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/12/29/0f50bbe9fa7c4aa19a004b667f70db7e.png",[],{"type":17,"tag":25,"props":201,"children":202},{},[203],{"type":23,"value":204},"Figure 2: Architecture of the proposed ST-VSR network CycMuNet",{"type":17,"tag":25,"props":206,"children":207},{},[208],{"type":23,"value":209},"As shown in Figure 2, two low-resolution (LR) input frames are used to generate three high-resolution (HR) frames and an LR intermediate frame. Specifically, CycMuNet uses a feature extractor (FE) to extract representations from the LR input frames and obtain an initialized intermediate representation using a feature temporal interpolation network (FTI-Net). Then, CycMuNet adopts mutual learning to exploit the mutual information between S-VSR and T-VSR via an up-projection unit (UPU) and a down-projection unit (DPU), eliminating cross-space errors. Finally, CycMuNet uses a reconstruction network (R) to reconstruct corresponding HR frames and the LR intermediate frame.",{"type":17,"tag":25,"props":211,"children":212},{},[213],{"type":17,"tag":133,"props":214,"children":216},{"alt":7,"src":215},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/12/29/9caf3f33eba74214af3d735b99bcb627.png",[],{"type":17,"tag":25,"props":218,"children":219},{},[220],{"type":23,"value":221},"Figure 3: structures of the UPU and DPU",{"type":17,"tag":25,"props":223,"children":224},{},[225],{"type":23,"value":226},"The core idea of this paper is cycle-projected mutual learning that ensures thorough mutual learning of the S-VSR and T-VSR tasks, thereby exploiting the spatial and temporal information. As shown in Figure 3, the core idea is implemented by iterative UPUs and DPUs. The function of the UPU is to use the temporal context information of high frame rate video to promote the reconstruction of spatial detail information. Each UPU includes two scale up modules and a scale down module. The first scale up module obtains rough HR features. Then, the scale down module performs back-projection to the LR space to learn cross-space errors. Finally, the second scale up module performs back-projection to the HR space to refine the rough HR features. The function of the DPU is to use the updated spatial information to refine the prediction of the temporal information. The process is like that of the scale up module.",{"type":17,"tag":25,"props":228,"children":229},{},[230],{"type":17,"tag":34,"props":231,"children":232},{},[233],{"type":23,"value":234},"04 Experiment Results",{"type":17,"tag":25,"props":236,"children":237},{},[238],{"type":23,"value":239},"To verify the effectiveness of the proposed network, CycMuNet is compared with state-of-the-art ST-VSR, S-VSR, and T-VSR methods. The results are as follows:",{"type":17,"tag":25,"props":241,"children":242},{},[243],{"type":23,"value":244},"(1) ST-VSR: As shown in Table 1, one-stage based methods show significant superiority than two-stage based methods in exploring the unilateral relationship between S-VSR and T-VSR tasks. CycMuNet uses iterative UPUs and DPUs to exploit the relationship between the two tasks to further improve the effect.",{"type":17,"tag":25,"props":246,"children":247},{},[248],{"type":17,"tag":133,"props":249,"children":251},{"alt":7,"src":250},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/12/29/a8ba21c52bfb410c9e3b5f9880c2a2d3.png",[],{"type":17,"tag":25,"props":253,"children":254},{},[255],{"type":23,"value":256},"Table 1: Comparisons of ST-VSR methods",{"type":17,"tag":25,"props":258,"children":259},{},[260],{"type":23,"value":261},"(2) S-VSR: As shown in Table 2, CycMuNet achieves comparable results using fewer parameters and training data, thanks to the UPUs that use the context information of the HFR video for S-VSR.",{"type":17,"tag":25,"props":263,"children":264},{},[265],{"type":17,"tag":133,"props":266,"children":268},{"alt":7,"src":267},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/12/29/2d79ebcf241742a58609f282724960a7.png",[],{"type":17,"tag":25,"props":270,"children":271},{},[272],{"type":23,"value":273},"Table 2: Comparisons of S-VSR methods",{"type":17,"tag":25,"props":275,"children":276},{},[277],{"type":23,"value":278},"(3) T-VSR: As shown in Table 3, CycMuNet achieves better performance. thanks to the DPUs that use HR information for LR intermediate frame generation.",{"type":17,"tag":25,"props":280,"children":281},{},[282],{"type":17,"tag":133,"props":283,"children":285},{"alt":7,"src":284},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/12/29/bc51d5a4c350486db0421dc89cd7fbee.png",[],{"type":17,"tag":25,"props":287,"children":288},{},[289],{"type":23,"value":290},"Table 3: Comparisons of T-VSR methods",{"type":17,"tag":25,"props":292,"children":293},{},[294],{"type":23,"value":295},"The CycMuNet model is now available on the MindSpore framework and supports training and inference on Ascend hardware. The development and usage of CycMuNet on MindSpore differ with from on PyTorch in the following aspects:",{"type":17,"tag":25,"props":297,"children":298},{},[299],{"type":23,"value":300},"(1) Development: It is challenging to manage the storage locations of tensors when coding in PyTorch because you may miss moving an input tensor to GPU memory. When testing the code on CPUs or other accelerators, you will need to manually clean or change the storage locations. Many networks that are implemented using PyTorch, which is supposed to be a cross-platform framework, can only run in CUDA environments. MindSpore's design fundamentally solves this problem. You only need to set the target device at the top-level code. The rest of the code can run on various devices without any modification. This design allows us to write code once to support different devices, while avoiding ecological fragmentation between network implementations developed on different hardware.",{"type":17,"tag":25,"props":302,"children":303},{},[304],{"type":23,"value":305},"(2) Training: MindSpore does not require you to manually compose steps such as data generation, forward calculation, backpropagation, and weight update in each step. Instead, various information required for training is aggregated into a model class, and MindSpore completes the operations during training. This design is not very easy to understand at first glance, but it enables MindSpore to control and optimize the entire training process.",{"type":17,"tag":25,"props":307,"children":308},{},[309],{"type":23,"value":310},"(3) Deployment: MindSpore's static graph design brings a great advantage over PyTorch: the weight and structure of the trained model can be easily saved together, separated from the structure definition code in Python. This feature is helpful for model deployment in the production environment, where you want a simplified operating environment without an extra Python interpreter. MindSpore can easily export models in MindIR and ONNX formats, while networks in PyTorch need to be rewritten to support TorchScript before they can be exported in the ONNX format, otherwise the export result may be incorrect. Therefore, MindSpore's requirements on the code style are actually an advantage, paving the way for subsequent deployment.",{"type":17,"tag":25,"props":312,"children":313},{},[314],{"type":17,"tag":34,"props":315,"children":316},{},[317],{"type":23,"value":318},"05 Summary and Prospects",{"type":17,"tag":25,"props":320,"children":321},{},[322],{"type":23,"value":323},"(1) The paper proposes a novel one-stage-based cycle-projected mutual learning network for ST-VSR that exploits the mutual learning of S-VSR and T-VSR tasks to fully explore the video spatial-temporal information.",{"type":17,"tag":25,"props":325,"children":326},{},[327],{"type":23,"value":328},"(2) UPUs and DPUs are used to implement mutual learning of the two tasks. The DPU uses rich spatial information to refine temporal prediction, while the UPU uses temporal correlations to refine texture and details.",{"type":17,"tag":25,"props":330,"children":331},{},[332],{"type":23,"value":333},"(3) Experiments prove that CycMuNet show significant superiority in S-VSR, T-VSR, and ST-VSR tasks.",{"type":17,"tag":25,"props":335,"children":336},{},[337],{"type":23,"value":338},"Acknowledgement: This research result has been funded by Huawei MindSpore Academic Award Fund of the Chinese Association for Artificial Intelligence.",{"title":7,"searchDepth":340,"depth":340,"links":341},4,[],"markdown","content:technology-blogs:en:2922.md","content","technology-blogs/en/2922.md","technology-blogs/en/2922","md",1776506107915]