[{"data":1,"prerenderedAt":255},["ShallowReactive",2],{"content-query-wlOSCLO5Jz":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"cover":11,"type":12,"body":13,"_type":249,"_id":250,"_source":251,"_file":252,"_stem":253,"_extension":254},"/technology-blogs/en/2976","en",false,"","OpenDMC: the First Deep Learning Open-Source Video Compression Algorithm Library Based on MindSpore, Supporting Cross-Platform Environments and Multiple Evaluation Indicators","Author: Li Ruifeng | Source: Zhihu","2023-11-24","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/02/04/0f80cdf9c62f4205be10a1a6dfd63c1a.png","technology-blogs",{"type":14,"children":15,"toc":246},"root",[16,24,30,35,40,45,50,61,66,75,88,93,98,107,115,120,128,136,141,149,157,162,170,175,180,188,196,201,208,213,220,225,233,241],{"type":17,"tag":18,"props":19,"children":21},"element","h1",{"id":20},"opendmc-the-first-deep-learning-open-source-video-compression-algorithm-library-based-on-mindspore-supporting-cross-platform-environments-and-multiple-evaluation-indicators",[22],{"type":23,"value":8},"text",{"type":17,"tag":25,"props":26,"children":27},"p",{},[28],{"type":23,"value":29},"Paper Title",{"type":17,"tag":25,"props":31,"children":32},{},[33],{"type":23,"value":34},"OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame Compression",{"type":17,"tag":25,"props":36,"children":37},{},[38],{"type":23,"value":39},"Paper Source",{"type":17,"tag":25,"props":41,"children":42},{},[43],{"type":23,"value":44},"ACM MultiMedia",{"type":17,"tag":25,"props":46,"children":47},{},[48],{"type":23,"value":49},"Paper URL",{"type":17,"tag":25,"props":51,"children":52},{},[53],{"type":17,"tag":54,"props":55,"children":59},"a",{"href":56,"rel":57},"https://www.acmmm2023.org/open-source-program/",[58],"nofollow",[60],{"type":23,"value":56},{"type":17,"tag":25,"props":62,"children":63},{},[64],{"type":23,"value":65},"Code URL",{"type":17,"tag":25,"props":67,"children":68},{},[69],{"type":17,"tag":54,"props":70,"children":73},{"href":71,"rel":72},"https://openi.pcl.ac.cn/OpenDMC/OpenDMC",[58],[74],{"type":23,"value":71},{"type":17,"tag":25,"props":76,"children":77},{},[78,80,86],{"type":23,"value":79},"As an open-source AI framework, MindSpore supports ultra-large-scale AI pre-training and brings excellent experience of device-edge-cloud synergy, simplified development, ultimate performance, and security and reliability for researchers and developers. Since it was open-sourced on March 28th, 2020, MindSpore has been downloaded for more than 6 million times. It has also been the subject of hundreds of papers presented at premier AI conferences. Furthermore, it has a large community of developers and has been introduced in over 100 top universities and 5000 commercial applications. Being widely used in scenarios such as AI computing centers, finance, smart manufacturing, cloud, wireless, datacom, energy, \"1+8+",{"type":17,"tag":81,"props":82,"children":83},"em",{},[84],{"type":23,"value":85},"N",{"type":23,"value":87},"\" consumers, and smart automobiles, MindSpore has emerged as the leading open-source software on Gitee. The MindSpore community extends a warm welcome to all who wish to contribute to open-source development kits, models, industrial applications, algorithm innovations, academic collaborations, AI-themed book writing, and application cases across the cloud, device, edge, and security.",{"type":17,"tag":25,"props":89,"children":90},{},[91],{"type":23,"value":92},"Thanks to the support from scientific, industry, and academic circles, MindSpore-based papers account for 7% of all papers about AI frameworks in 2023, ranking No. 2 globally for two consecutive years. The MindSpore community is thrilled to share and interpret top-level conference papers and is looking forward to collaborating with experts from industries, academia, and research institutions, so as to yield proprietary AI outcomes and innovate AI applications. In this blog, I'd like to share the paper of the team led by Prof. Gao Wei from Peking University.",{"type":17,"tag":25,"props":94,"children":95},{},[96],{"type":23,"value":97},"MindSpore aims to achieve three goals: easy development, efficient execution, and all-scenario coverage. The development of MindSpore has been characterized by rapid improvements with successive iterations, with its API design being more complete, reasonable, and powerful. To augment its convenience and power, several kits based on MindSpore have been developed. One such example is MindSpore Insight, which can present model architectures in graphs and dynamically monitor the changes of indicators and parameters during model execution, thereby simplifying the development process.",{"type":17,"tag":25,"props":99,"children":100},{},[101],{"type":17,"tag":102,"props":103,"children":104},"strong",{},[105],{"type":23,"value":106},"01",{"type":17,"tag":25,"props":108,"children":109},{},[110],{"type":17,"tag":102,"props":111,"children":112},{},[113],{"type":23,"value":114},"Research Background",{"type":17,"tag":25,"props":116,"children":117},{},[118],{"type":23,"value":119},"Video streaming has become an indispensable part of our daily life. Video applications are serving billions of people on the Internet, creating huge demands on efficient video transmission and storage. Despite existing excellent video coding algorithms, there is no algorithm library that can effectively classify and organize them, evaluate their performance with different standards, and implement them on multiple platforms (in particular, newly emerged platforms such as MindSpore).",{"type":17,"tag":25,"props":121,"children":122},{},[123],{"type":17,"tag":102,"props":124,"children":125},{},[126],{"type":23,"value":127},"02",{"type":17,"tag":25,"props":129,"children":130},{},[131],{"type":17,"tag":102,"props":132,"children":133},{},[134],{"type":23,"value":135},"Team Introduction",{"type":17,"tag":25,"props":137,"children":138},{},[139],{"type":23,"value":140},"Gao Wei: Assistant Professor/Researcher/PhD Supervisor from the School of Electronic and Computer Engineering of Peking University; IEEE/CCF/CSIG Senior Member. His team published more than 100 papers in high-level international journals (IEEE TPAMI, TIP, TCSVT, TMM, TNNLS, TCYB, TGRS, etc.) and conferences (CVPR, ECCV, AAAI, ACM MM, DCC, etc.), applied for or granted more than 80 US/China/PCT patents, and submitted more than 40 technical proposals while actively participating in the formulation of standards for multimedia and AI technologies. Two papers were selected as ESI Highly Cited Papers, and four papers won the Top Paper Awards. Gao Wei was awarded the 2021 IEEE Multimedia Rising Star for his achievements in 3D immersive media research, 2022 CCF Excellent Open Source Graphic Software Nomination Award, 2021 CCF-Tencent Rhinoceros Bird Excellent Patent Award, and CCF-Tencent Rhinoceros Bird Fund in 2020 and 2019.",{"type":17,"tag":25,"props":142,"children":143},{},[144],{"type":17,"tag":102,"props":145,"children":146},{},[147],{"type":23,"value":148},"03",{"type":17,"tag":25,"props":150,"children":151},{},[152],{"type":17,"tag":102,"props":153,"children":154},{},[155],{"type":23,"value":156},"Introduction to the Paper",{"type":17,"tag":25,"props":158,"children":159},{},[160],{"type":23,"value":161},"OpenDMC is the first open-source deep learning algorithm library designed specifically for video compression tasks. Excellent libraries for compression, such as CompressAI, mainly focus on image compression instead of video compression algorithms. The emergence of OpenDMC fills the gap as a video compression algorithm library. OpenDMC supports multiple platforms including MindSpore, and multiple typical video compression algorithms including DVC, DCVC, SSFVC, and DVC-P. In addition, it proposes various classification standards to organize these algorithms: the residual- or condition-based coding method; objective or perceptual supervision method; bi-directional or uni-directional spatiotemporal modeling method. In OpenDMC, multiple evaluation indicators are used to evaluate algorithms, covering Rate-Distortion (RD) performance, running time, and GPU memory usage, as shown in the following figure.",{"type":17,"tag":25,"props":163,"children":164},{},[165],{"type":17,"tag":166,"props":167,"children":169},"img",{"alt":7,"src":168},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/02/04/75257d9e8de54a788e75478de60a104d.png",[],{"type":17,"tag":25,"props":171,"children":172},{},[173],{"type":23,"value":174},"Figure 1 Cross-platform algorithms supported by OpenDMC and related evaluation indicators",{"type":17,"tag":25,"props":176,"children":177},{},[178],{"type":23,"value":179},"DVC is one of the earliest works on deep video coding. It uses an encoder-decoder optical flow convolution network to estimate inter-frame motion, and then compresses the residuals to save bit rate. The reference frame is warped to the target frame through the predicted optical flow, and then the residual is obtained. Then, the residual can be better quantized using non-linear neural networks. SSFVC proposes scale-space flow estimation and scale-space warping techniques. A scale field is added as a third dimension to the conventional 2- channel flow field, enhancing management of difficult cases and ensuring a more graceful degradation when flow-based predictions are not possible. DCVC utilizes learnable high-dimensional temporal contextual features as conditions for frame compression. To address the spatial discontinuity caused by motion compensation, DCVC applies a context refinement module to generate the final contextual features. These contextual features are then used as conditional inputs to both the encoder and decoder within a parallel and concatenated architecture. DVC-P proposes a deep video compression framework with perceptual optimization. It points out that optimizing video compression solely for improving PSNR does not always enhance the perceptual quality. Specifically, inspired by generative adversarial networks, DVC-P adds a discriminator network and mixed loss to the DVC framework.",{"type":17,"tag":25,"props":181,"children":182},{},[183],{"type":17,"tag":102,"props":184,"children":185},{},[186],{"type":23,"value":187},"04",{"type":17,"tag":25,"props":189,"children":190},{},[191],{"type":17,"tag":102,"props":192,"children":193},{},[194],{"type":23,"value":195},"Experimental Result",{"type":17,"tag":25,"props":197,"children":198},{},[199],{"type":23,"value":200},"The environment setup, training and inference processes of related experiments in this paper are all implemented under the MindSpore framework. With detailed documents, large communities, and efficient underlying implementation, it is easy to use MindSpore to set up the experiment environment. In addition, MindSpore achieves the same model performance and inference duration as other deep learning frameworks such as PyTorch and TensorFlow. The following figure shows the test results. As shown in Table 1, DCVC, SSFVC, and DVC-P achieve a greater performance improvement than BD-PSNR in terms of the BD-MSSSIM indicator. As shown in Figure 2, efficiency is measured by the running time and GPU memory occupation, where the running time is the sum of the frames in all the sequences of UVG dataset. Obviously, the fastest algorithm is SSFVC with scale-space warp. DCVC takes the longest running time, mainly because the auto-regressive model predicting the context adopted in DCVC significantly increases the time complexity.",{"type":17,"tag":25,"props":202,"children":203},{},[204],{"type":17,"tag":166,"props":205,"children":207},{"alt":7,"src":206},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/02/04/596473567ba64b3ea778bae7597eaa6e.png",[],{"type":17,"tag":25,"props":209,"children":210},{},[211],{"type":23,"value":212},"Table 1 BD-PSNR and BD-MSSSIM comparisons with DVC in different test sequences for each algorithm",{"type":17,"tag":25,"props":214,"children":215},{},[216],{"type":17,"tag":166,"props":217,"children":219},{"alt":7,"src":218},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/02/04/dbcaad7837ec4d2c8186436189e687a9.png",[],{"type":17,"tag":25,"props":221,"children":222},{},[223],{"type":23,"value":224},"Figure 2 Efficiency indicators of cross-platform algorithms supported by OpenDMC",{"type":17,"tag":25,"props":226,"children":227},{},[228],{"type":17,"tag":102,"props":229,"children":230},{},[231],{"type":23,"value":232},"05",{"type":17,"tag":25,"props":234,"children":235},{},[236],{"type":17,"tag":102,"props":237,"children":238},{},[239],{"type":23,"value":240},"Summary and Prospects",{"type":17,"tag":25,"props":242,"children":243},{},[244],{"type":23,"value":245},"The blog introduces the first deep learning-based open-source library of video compression algorithms, OpenDMC, which can be implemented on multiple platforms. At the beginning, this blog briefly describes the algorithms used in the library and their classification. Then, the performance of representative deep learning-based video compression algorithms is tested. The performance of each model is analyzed in detail, including RD performance, time complexity, and space complexity. All related code has been open-sourced. Thanks to detailed documents and comprehensive community support on MindSpore, experiments in the paper can be easily reproduced. We are expecting that OpenDMC can provide code support for developers in multiple communities, including MindSpore, to enrich the video compression ecosystem and contribute to more excellent open source works.",{"title":7,"searchDepth":247,"depth":247,"links":248},4,[],"markdown","content:technology-blogs:en:2976.md","content","technology-blogs/en/2976.md","technology-blogs/en/2976","md",1776506108551]