[{"data":1,"prerenderedAt":458},["ShallowReactive",2],{"content-query-QGEoc1g2ae":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"cover":11,"type":12,"category":13,"body":14,"_type":452,"_id":453,"_source":454,"_file":455,"_stem":456,"_extension":457},"/technology-blogs/en/3282","en",false,"","MindSpore-based Order-preserving Consistency Regularization to Improve Cross-domain Task Performance","This blog introduces a novel regularization method, order-preserving consistency regularization (OCR) that can efficiently solve the problem of domain shifts.","2024-08-01","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2025/01/17/2f1cb25403724845aaeebbf28c20fc23.png","technology-blogs","Practices",{"type":15,"children":16,"toc":449},"root",[17,25,31,36,41,46,51,56,67,72,81,86,91,96,101,106,111,116,121,126,139,144,152,157,214,221,226,233,238,245,256,263,294,330,337,381,386,391,396,403,410,415,422,427,434,439,444],{"type":18,"tag":19,"props":20,"children":22},"element","h1",{"id":21},"mindspore-based-order-preserving-consistency-regularization-to-improve-cross-domain-task-performance",[23],{"type":24,"value":8},"text",{"type":18,"tag":26,"props":27,"children":28},"p",{},[29],{"type":24,"value":30},"Author: Li Ruifeng | Source: Zhihu",{"type":18,"tag":26,"props":32,"children":33},{},[34],{"type":24,"value":35},"Paper Title",{"type":18,"tag":26,"props":37,"children":38},{},[39],{"type":24,"value":40},"Order-preserving Consistency Regularization for Domain Adaptation and Generalization",{"type":18,"tag":26,"props":42,"children":43},{},[44],{"type":24,"value":45},"Source",{"type":18,"tag":26,"props":47,"children":48},{},[49],{"type":24,"value":50},"ICCV2023",{"type":18,"tag":26,"props":52,"children":53},{},[54],{"type":24,"value":55},"Paper URL",{"type":18,"tag":26,"props":57,"children":58},{},[59],{"type":18,"tag":60,"props":61,"children":65},"a",{"href":62,"rel":63},"https://openaccess.thecvf.com/content/ICCV2023/html/Jing_Order-preserving_Consistency_Regularization_for_Domain_Adaptation_and_Generalization_ICCV_2023_paper.html",[64],"nofollow",[66],{"type":24,"value":62},{"type":18,"tag":26,"props":68,"children":69},{},[70],{"type":24,"value":71},"Code URL",{"type":18,"tag":26,"props":73,"children":74},{},[75],{"type":18,"tag":60,"props":76,"children":79},{"href":77,"rel":78},"https://github.com/Tongzhou-uestc/ocr-mindspore",[64],[80],{"type":24,"value":77},{"type":18,"tag":26,"props":82,"children":83},{},[84],{"type":24,"value":85},"As an open-source AI framework, MindSpore supports ultra-large-scale AI pre-training and brings excellent experience of device-edge-cloud synergy, simplified development, ultimate performance, and security and reliability for researchers and developers. To date, more than 1000 papers about MindSpore have been published by universities and scientific research institutions at top AI conferences. In this blog, I'd like to share the paper of the team led by Pro. Li Jingjing, School of Computer Science and Engineering, University of Electronic Science and Technology of China.",{"type":18,"tag":26,"props":87,"children":88},{},[89],{"type":24,"value":90},"01 Research Background",{"type":18,"tag":26,"props":92,"children":93},{},[94],{"type":24,"value":95},"In the digital era, deep learning models show great potential in computer vision tasks, especially when training and test datasets follow the same distribution. However, in a real-world scenario, domain shifts between the training and test datasets often occur in a model, which reduces expected performance of the model and affects reliability of model deployment. For example, in safety-critical applications such as medical image recognition and autonomous driving, a failing model may lead to serious consequences.",{"type":18,"tag":26,"props":97,"children":98},{},[99],{"type":24,"value":100},"Domain shifts are usually caused by domain-specific attributes, such as illumination, background, and shooting angle. Although these attributes are irrelevant to tasks, they can cause data distribution offsets. To solve this problem, researchers used data augmentation and consistency regularization techniques to make models less sensitive to these domain-specific attributes. Data augmentation integrates domain-specific information by changing data, while consistent regularization forces a model to remain invariant to domain shifts by imposing the same representations or predictions on a single image before and after perturbation.",{"type":18,"tag":26,"props":102,"children":103},{},[104],{"type":24,"value":105},"Although some progress has been made, the existing methods either impose too strict constraints on model training or fail to maintain the order of classification probabilities. For example, some methods may simply require a model to produce exactly the same representations for two views of one image or focus only on the consistency of the maximum classification probabilities and ignore the order of other categories. These constraints are too strict or reduce the discriminability of the model.",{"type":18,"tag":26,"props":107,"children":108},{},[109],{"type":24,"value":110},"To solve these problems, this paper proposes a novel regularization method, order-preserving consistency regularization (OCR). OCR makes a model robust to task-irrelevant transformations by maintaining the sequence predictions and the ranking of categories consistent even after data augmentation is performed on one image. A series of experiments have proved that this method achieves clear advantages on multiple different cross-domain tasks, especially in terms of robustness against adversarial attacks.",{"type":18,"tag":26,"props":112,"children":113},{},[114],{"type":24,"value":115},"02 Team Introduction",{"type":18,"tag":26,"props":117,"children":118},{},[119],{"type":24,"value":120},"Li Jingjing, researcher and doctoral advisor in the School of Computer Science and Engineering, University of Electronic Science and Technology of China, has published more than 70 articles in JCR journals and CCFA conferences, such as TPAMI, TIP, TKDE, CVPR, and MM. His research results were highly cited and selected as the best candidate for ACM MM papers, ESI hot papers, and top 100 most internationally influential academic papers in 2019. In addition, he has won several provincial and national awards for his outstanding academic researches.",{"type":18,"tag":26,"props":122,"children":123},{},[124],{"type":24,"value":125},"03 Introduction to the Paper",{"type":18,"tag":26,"props":127,"children":128},{},[129,131,137],{"type":24,"value":130},"Problem formulation: Assume that a training dataset Dtrain = {x ∈ Xtrain, 1∈ Ytrain} is used in computer vision challenges, such as image classification or semantic segmentation, and Xtrain and Ytrain are the image set and label set, respectively. The objective is to establish the relationship between the data Xtrain and the ground-truth label Ytrain. In the classical Empirical Risk Minimization (ERM), the training objective is to select a hypothesis h:X→Y from a predefined hypothesis space ",{"type":18,"tag":132,"props":133,"children":134},"em",{},[135],{"type":24,"value":136},"H",{"type":24,"value":138}," to minimize the empirical risk on Dtrain: infh∈H=E(x,y)∼Dtrain [L(h(x), y)]. However, model performance may deteriorate when the test dataset Dtest is used since there may be domain shifts between the training dataset Dtrain and the test dataset Dtest, that is, P(Xtrain)≠ P(Xtest). For example, samples of the same category often have different appearances in the training dataset and test dataset. These changes may be caused by factors such as lighting conditions, camera angles, and backgrounds. These accidental attributes, although unrelated to the task presented in this document, cause domain shifts. To achieve a good generalization effect, a model needs to be trained so that it can be invariant to these domain-specific attributes.",{"type":18,"tag":26,"props":140,"children":141},{},[142],{"type":24,"value":143},"3.1 Architecture of the Multimodal Knowledge Graph Embedding Model",{"type":18,"tag":26,"props":145,"children":146},{},[147],{"type":18,"tag":148,"props":149,"children":151},"img",{"alt":7,"src":150},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/08/31/60beebe9116d41b1b5df481ff906600f.png",[],{"type":18,"tag":26,"props":153,"children":154},{},[155],{"type":24,"value":156},"3.2 Method Overview",{"type":18,"tag":26,"props":158,"children":159},{},[160,162,167,169,173,175,180,182,186,188,193,195,200,202,206,208,212],{"type":24,"value":161},"OCR consists of three steps: data augmentation, residual component separation, and residual entropy maximization. Data augmentation is a common technique that increases the diversity of samples and helps improve the generalization of models. Given a sample xo, we can use transformations ",{"type":18,"tag":132,"props":163,"children":164},{},[165],{"type":24,"value":166},"N",{"type":24,"value":168}," to obtain its enhanced version xa = ",{"type":18,"tag":132,"props":170,"children":171},{},[172],{"type":24,"value":166},{"type":24,"value":174},"(xo). Specifically, we divide the hypothesis ",{"type":18,"tag":132,"props":176,"children":177},{},[178],{"type":24,"value":179},"h",{"type":24,"value":181}," into two parts, i.e. ",{"type":18,"tag":132,"props":183,"children":184},{},[185],{"type":24,"value":179},{"type":24,"value":187}," = ",{"type":18,"tag":132,"props":189,"children":190},{},[191],{"type":24,"value":192},"F",{"type":24,"value":194}," × ",{"type":18,"tag":132,"props":196,"children":197},{},[198],{"type":24,"value":199},"G",{"type":24,"value":201},", where ",{"type":18,"tag":132,"props":203,"children":204},{},[205],{"type":24,"value":199},{"type":24,"value":207}," is the backbone model and ",{"type":18,"tag":132,"props":209,"children":210},{},[211],{"type":24,"value":192},{"type":24,"value":213}," is the classifier. In this paper, xo and xa are input into G to obtain two different representations of the same sample:",{"type":18,"tag":26,"props":215,"children":216},{},[217],{"type":18,"tag":148,"props":218,"children":220},{"alt":7,"src":219},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/08/31/d0812367bc984e1189fd6a34e2a99ef9.png",[],{"type":18,"tag":26,"props":222,"children":223},{},[224],{"type":24,"value":225},"The residual component in this paper is defined as a change in the augmented representation relative to the original representation. An intuitive way to separate the residual components is to subtract the original representation from the augmented representation. To more flexibly control the proportion of the residual component, linear relations are considered:",{"type":18,"tag":26,"props":227,"children":228},{},[229],{"type":18,"tag":148,"props":230,"children":232},{"alt":7,"src":231},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/08/31/1f5d9891b9464618b36ad64a3e0cf6f9.png",[],{"type":18,"tag":26,"props":234,"children":235},{},[236],{"type":24,"value":237},"where _z_n is the residual component, and λ∈(0,1) represents the proportion of _z_o in _z_a. From the perspective of the Occam's razor, linearity is a good inductive bias, which is also used in mixup. In addition, the relation is an invertible operation so that _z_n can be easily inferred from _z_o and _z_a:",{"type":18,"tag":26,"props":239,"children":240},{},[241],{"type":18,"tag":148,"props":242,"children":244},{"alt":7,"src":243},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/08/31/18ab621d3e7c4426848edf701932c1b5.png",[],{"type":18,"tag":26,"props":246,"children":247},{},[248,250,254],{"type":24,"value":249},"With the residual component _z_n, the uncertainty of _z_n's prediction can be maximized so that it does not contain too much classification-related information. As the entropy can be considered as a measure of prediction uncertainty, the conditional entropy ",{"type":18,"tag":148,"props":251,"children":253},{"alt":7,"src":252},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/08/31/ff02e2b356594a6e99fcb42ba83a7429.png",[],{"type":24,"value":255},"can be maximized to enlarge the uncertainty of zn's prediction. Therefore, the goal of this paper is as follows:",{"type":18,"tag":26,"props":257,"children":258},{},[259],{"type":18,"tag":148,"props":260,"children":262},{"alt":7,"src":261},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/08/31/c64786af50d443a6a2bd52578117d129.png",[],{"type":18,"tag":26,"props":264,"children":265},{},[266,268,272,274,279,281,286,288,292],{"type":24,"value":267},"where ",{"type":18,"tag":148,"props":269,"children":271},{"alt":7,"src":270},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/08/31/1dce08feec304b09a86687050e2cb868.png",[],{"type":24,"value":273}," is the prediction of _z_n, ",{"type":18,"tag":132,"props":275,"children":276},{},[277],{"type":24,"value":278},"B",{"type":24,"value":280}," is the batch size, ",{"type":18,"tag":132,"props":282,"children":283},{},[284],{"type":24,"value":285},"C",{"type":24,"value":287}," is the number of categories, and ",{"type":18,"tag":132,"props":289,"children":290},{},[291],{"type":24,"value":136},{"type":24,"value":293}," is the entropy. By minimizing the above equation, _z_n is regularized to have equal probability of being classified into each category.",{"type":18,"tag":26,"props":295,"children":296},{},[297,299,304,306,310,312,316,318,322,324,328],{"type":24,"value":298},"During the training process, ",{"type":18,"tag":132,"props":300,"children":301},{},[302],{"type":24,"value":303},"λ",{"type":24,"value":305}," is used to control the proportion of the residual component and the original representation in the augmented representation. ",{"type":18,"tag":132,"props":307,"children":308},{},[309],{"type":24,"value":303},{"type":24,"value":311}," should dynamically change to match the model training process. At the beginning of training, the model is sensitive to domain-specific attributes, so the difference between _z_o and _z_a is large. Then, ",{"type":18,"tag":132,"props":313,"children":314},{},[315],{"type":24,"value":303},{"type":24,"value":317}," should be a small value so that the proportion of _z_o in _z_a is lower. As training goes on, the model gradually becomes insensitive to domain-specific attributes, and _z_o in _z_a would be similar. Accordingly, ",{"type":18,"tag":132,"props":319,"children":320},{},[321],{"type":24,"value":303},{"type":24,"value":323}," should be increased to a larger value. Inspired by Ganin et al., this paper uses an annealing strategy of ",{"type":18,"tag":132,"props":325,"children":326},{},[327],{"type":24,"value":303},{"type":24,"value":329},":",{"type":18,"tag":26,"props":331,"children":332},{},[333],{"type":18,"tag":148,"props":334,"children":336},{"alt":7,"src":335},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/08/31/1fdaa2e4978844d9a327543bb48c6225.png",[],{"type":18,"tag":26,"props":338,"children":339},{},[340,341,346,348,353,355,360,362,367,369,373,375,379],{"type":24,"value":267},{"type":18,"tag":132,"props":342,"children":343},{},[344],{"type":24,"value":345},"α",{"type":24,"value":347}," = 10, ",{"type":18,"tag":132,"props":349,"children":350},{},[351],{"type":24,"value":352},"β",{"type":24,"value":354}," = 0.75, ",{"type":18,"tag":132,"props":356,"children":357},{},[358],{"type":24,"value":359},"t",{"type":24,"value":361}," is the current number of iterations, and ",{"type":18,"tag":132,"props":363,"children":364},{},[365],{"type":24,"value":366},"T",{"type":24,"value":368}," is the total number of iterations. _λ_0 is the initial value of ",{"type":18,"tag":132,"props":370,"children":371},{},[372],{"type":24,"value":303},{"type":24,"value":374},". In this way, ",{"type":18,"tag":132,"props":376,"children":377},{},[378],{"type":24,"value":303},{"type":24,"value":380}," is more likely to be sampled to a smaller value at the beginning of training, and then gradually becomes larger as the training proceeds.",{"type":18,"tag":26,"props":382,"children":383},{},[384],{"type":24,"value":385},"04 Experiment Results",{"type":18,"tag":26,"props":387,"children":388},{},[389],{"type":24,"value":390},"This paper tests the OCR performance in multiple cross-domain tasks, such as domain adaptive classification, test-time adaptation, domain generalization classification, and adversarial attacks.",{"type":18,"tag":26,"props":392,"children":393},{},[394],{"type":24,"value":395},"4.1 Domain Adaptation",{"type":18,"tag":26,"props":397,"children":398},{},[399],{"type":18,"tag":148,"props":400,"children":402},{"alt":7,"src":401},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/08/31/5c183a7fbee347189b156d5b30f74918.png",[],{"type":18,"tag":26,"props":404,"children":405},{},[406],{"type":18,"tag":148,"props":407,"children":409},{"alt":7,"src":408},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/08/31/cfa60c252df8432a90e3db41741b66a8.png",[],{"type":18,"tag":26,"props":411,"children":412},{},[413],{"type":24,"value":414},"4.3 Domain Generalization Classification",{"type":18,"tag":26,"props":416,"children":417},{},[418],{"type":18,"tag":148,"props":419,"children":421},{"alt":7,"src":420},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/08/31/48f7e6e786354356ad13318836af7512.png",[],{"type":18,"tag":26,"props":423,"children":424},{},[425],{"type":24,"value":426},"4.4 Adversarial Attack",{"type":18,"tag":26,"props":428,"children":429},{},[430],{"type":18,"tag":148,"props":431,"children":433},{"alt":7,"src":432},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/08/31/70f3efb524044abd9ceac275ae7b85c5.png",[],{"type":18,"tag":26,"props":435,"children":436},{},[437],{"type":24,"value":438},"05 Summary and Prospects",{"type":18,"tag":26,"props":440,"children":441},{},[442],{"type":24,"value":443},"In this paper, an order-preserving consistency regularization method is proposed to solve domain shifts in cross-domain tasks. Through data augmentation, residual component separation, and residual entropy maximization, OCR can effectively improve the robustness of models to domain-specific attributes. The experimental results show that OCR has achieved significant performance in multiple cross-domain tasks, due to its superb effectiveness and generalization. MindSpore combines functional and object-oriented programming paradigms to leverage the strengths of both styles, offering flexible and efficient support for AI model training.",{"type":18,"tag":26,"props":445,"children":446},{},[447],{"type":24,"value":448},"With its efficient computing performance and flexible programming APIs, MindSpore, as an emerging AI framework, provides robust support for the development and training of deep learning models. As AI technologies continue to evolve, MindSpore has introduced innovations in optimization algorithms, hardware acceleration, and distributed computing, enhancing its competitiveness in both scientific research and industrial applications. In the future, MindSpore is expected to play a significant role in cutting-edge fields, aiding researchers and developers in implementing efficient and intelligent solutions.",{"title":7,"searchDepth":450,"depth":450,"links":451},4,[],"markdown","content:technology-blogs:en:3282.md","content","technology-blogs/en/3282.md","technology-blogs/en/3282","md",1776506110927]