[{"data":1,"prerenderedAt":566},["ShallowReactive",2],{"content-query-evRWsITDQq":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":10,"date":11,"cover":12,"type":13,"category":14,"body":15,"_type":560,"_id":561,"_source":562,"_file":563,"_stem":564,"_extension":565},"/technology-blogs/en/1765","en",false,"",[9],"Paper Interpretation","In the future, we will try to explore trick packages for other challenging long-tailed tasks, such as detection and segmentation.","2022-06-29","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/fd158be0427e47eb83d4b55d89590a72.png","technology-blogs","Influencers",{"type":16,"children":17,"toc":557},"root",[18,32,38,47,52,57,65,75,80,93,98,106,111,116,124,132,137,148,156,164,176,183,190,198,209,214,219,224,229,234,239,246,251,256,261,266,271,278,283,291,301,306,311,316,321,326,331,339,344,349,354,359,364,369,374,379,384,389,396,404,414,421,431,438,454,461,471,478,483,490,495,502,507,515,525,530,538,543,548,553],{"type":19,"tag":20,"props":21,"children":23},"element","h1",{"id":22},"paper-interpretation-summary-of-aaai-long-tailed-visual-recognition-tricks",[24,30],{"type":19,"tag":25,"props":26,"children":27},"span",{},[28],{"type":29,"value":9},"text",{"type":29,"value":31}," Summary of AAAI Long - Tailed Visual Recognition Tricks",{"type":19,"tag":33,"props":34,"children":35},"p",{},[36],{"type":29,"value":37},"June 29, 2022",{"type":19,"tag":33,"props":39,"children":40},{},[41],{"type":19,"tag":42,"props":43,"children":44},"strong",{},[45],{"type":29,"value":46},"1. Research Background",{"type":19,"tag":33,"props":48,"children":49},{},[50],{"type":29,"value":51},"In recent years, thanks to complex paradigms like meta learning, visual recognition on long-tailed distributions has been developing in full swing. In addition to these complicated methods, simple refinement for the training process also plays an important role in performance improvement.",{"type":19,"tag":33,"props":53,"children":54},{},[55],{"type":29,"value":56},"These refinements (or called tricks), such as adjusting data distribution or loss function weights, are simple but effective. However, different tricks may conflict with each other. If they are not properly used, the recognition accuracy may fall short of our expectation. Unfortunately, the existing papers do not provide any guidance on using these tricks.",{"type":19,"tag":33,"props":58,"children":59},{},[60],{"type":19,"tag":42,"props":61,"children":62},{},[63],{"type":29,"value":64},"2. Author Introduction",{"type":19,"tag":33,"props":66,"children":67},{},[68,73],{"type":19,"tag":42,"props":69,"children":70},{},[71],{"type":29,"value":72},"Wei Xiushen",{"type":29,"value":74},": PhD, professor of School of Computer Science and Engineering of Nanjing University of Science and Technology, key member of PCA Lab, and student entrepreneurship mentor of Nanjing University.",{"type":19,"tag":33,"props":76,"children":77},{},[78],{"type":29,"value":79},"He majors in computer vision and machine learning, with more than 40 papers published in top international journals and conferences. Dr Wei's works have been cited by Google Scholar for nearly 2000 times, and have won four world champions in international authoritative competitions (including iNaturalist) in computer vision.",{"type":19,"tag":33,"props":81,"children":82},{},[83,85,91],{"type":29,"value":84},"He once gave lectures on fine-grained image analysis at international conferences such as CVPR and ICME, and has published the book ",{"type":19,"tag":86,"props":87,"children":88},"em",{},[89],{"type":29,"value":90},"Analytical Deep Learning: Convolutional Neural Network Principles and Visual Practice",{"type":29,"value":92},".",{"type":19,"tag":33,"props":94,"children":95},{},[96],{"type":29,"value":97},"Dr Wei is also the CVPR 2017 Best PC Member, program committee chairman of international conferences including ICCV, IJCAI, ACM Multimedia, and ACCV, chairman of ACCV 2022 Tutorial, senior program committee member of IJCAI 2021, program committee member of CCF-A conferences, and reviewer of international journals such as IEEE TPAMI, TIP, TNNLS, MLJ, and TMM.",{"type":19,"tag":33,"props":99,"children":100},{},[101],{"type":19,"tag":42,"props":102,"children":103},{},[104],{"type":29,"value":105},"3. Paper Description",{"type":19,"tag":33,"props":107,"children":108},{},[109],{"type":29,"value":110},"This paper collects existing long-tailed visual recognition tricks and summarizes their effective combinations. By carrying out systematic experiments, it also gives detailed experimental guidance.",{"type":19,"tag":33,"props":112,"children":113},{},[114],{"type":29,"value":115},"In addition, a long-tailed visual data enhancement method based on class activation maps (CAMs) is proposed in this paper. This method can be combined with re-sampling tricks and good results are obtained. Scientific combination of these tricks can surpass the most advanced methods based on four long-tailed benchmark datasets, including ImageNet-LT and iNaturalist 2018.",{"type":19,"tag":33,"props":117,"children":118},{},[119],{"type":19,"tag":120,"props":121,"children":123},"img",{"alt":7,"src":122},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/1f32150208c54a6a9b8eef9e87a7c819.png",[],{"type":19,"tag":33,"props":125,"children":126},{},[127],{"type":19,"tag":42,"props":128,"children":129},{},[130],{"type":29,"value":131},"4. Link",{"type":19,"tag":33,"props":133,"children":134},{},[135],{"type":29,"value":136},"Paper:",{"type":19,"tag":33,"props":138,"children":139},{},[140],{"type":19,"tag":141,"props":142,"children":146},"a",{"href":143,"rel":144},"https://ojs.aaai.org/index.php/",[145],"nofollow",[147],{"type":29,"value":143},{"type":19,"tag":33,"props":149,"children":150},{},[151],{"type":19,"tag":42,"props":152,"children":153},{},[154],{"type":29,"value":155},"5. Key Points of the Algorithm Framework",{"type":19,"tag":33,"props":157,"children":158},{},[159],{"type":19,"tag":42,"props":160,"children":161},{},[162],{"type":29,"value":163},"Re-weighting Method",{"type":19,"tag":33,"props":165,"children":166},{},[167,169,174],{"type":29,"value":168},"The ",{"type":19,"tag":42,"props":170,"children":171},{},[172],{"type":29,"value":173},"cost-sensitive re-weighting method",{"type":29,"value":175}," is commonly used in long-tailed recognition. These methods guide the network to focus more on the minority class by assigning different weights to different classes. This paper compares a baseline loss and four re-weighted losses:",{"type":19,"tag":33,"props":177,"children":178},{},[179],{"type":19,"tag":120,"props":180,"children":182},{"alt":7,"src":181},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/823fe234e99d40faad75774aec3df3e0.png",[],{"type":19,"tag":33,"props":184,"children":185},{},[186],{"type":19,"tag":120,"props":187,"children":189},{"alt":7,"src":188},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/ff4f646549de48cfbb3ac0ed0079c532.png",[],{"type":19,"tag":33,"props":191,"children":192},{},[193],{"type":19,"tag":42,"props":194,"children":195},{},[196],{"type":29,"value":197},"Re-sampling Method",{"type":19,"tag":33,"props":199,"children":200},{},[201,202,207],{"type":29,"value":168},{"type":19,"tag":42,"props":203,"children":204},{},[205],{"type":29,"value":206},"re-sampling method",{"type":29,"value":208}," tries to re-sample the data to obtain an evenly distributed dataset.",{"type":19,"tag":33,"props":210,"children":211},{},[212],{"type":29,"value":213},"l Random over-sampling",{"type":19,"tag":33,"props":215,"children":216},{},[217],{"type":29,"value":218},"Copies training images randomly sampled from the minority class. This method is effective in most scenarios, but may cause overfitting.",{"type":19,"tag":33,"props":220,"children":221},{},[222],{"type":29,"value":223},"l Random under-sampling",{"type":19,"tag":33,"props":225,"children":226},{},[227],{"type":29,"value":228},"Randomly deletes training images from the head class until all classes become balanced. In some cases, under-sampling is more effective than over-sampling.",{"type":19,"tag":33,"props":230,"children":231},{},[232],{"type":29,"value":233},"l Class-balanced sampling",{"type":19,"tag":33,"props":235,"children":236},{},[237],{"type":29,"value":238},"The probability that each class is selected is as follows:",{"type":19,"tag":33,"props":240,"children":241},{},[242],{"type":19,"tag":120,"props":243,"children":245},{"alt":7,"src":244},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/0c6207fab7bf45e5b48cd0e7577c5694.png",[],{"type":19,"tag":33,"props":247,"children":248},{},[249],{"type":29,"value":250},"When q = 0, evenly sample each class first, and then randomly select a sample from the selected class.",{"type":19,"tag":33,"props":252,"children":253},{},[254],{"type":29,"value":255},"l Square-root sampling",{"type":19,"tag":33,"props":257,"children":258},{},[259],{"type":29,"value":260},"Set q to 0.5, and construct a sampling set between the original distribution and the equilibrium distribution.",{"type":19,"tag":33,"props":262,"children":263},{},[264],{"type":29,"value":265},"l Progressively-balanced sampling",{"type":19,"tag":33,"props":267,"children":268},{},[269],{"type":29,"value":270},"The sampling probability of the class is gradually changed from unbalanced distribution to balanced distribution. The sampling probability of the class j is as follows:",{"type":19,"tag":33,"props":272,"children":273},{},[274],{"type":19,"tag":120,"props":275,"children":277},{"alt":7,"src":276},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/dd38418763c94d3ab00b1009d4509283.png",[],{"type":19,"tag":33,"props":279,"children":280},{},[281],{"type":29,"value":282},"t indicates the number of current epochs, and T indicates the total number of epochs.",{"type":19,"tag":33,"props":284,"children":285},{},[286],{"type":19,"tag":42,"props":287,"children":288},{},[289],{"type":29,"value":290},"Mixup Training",{"type":19,"tag":33,"props":292,"children":293},{},[294,299],{"type":19,"tag":42,"props":295,"children":296},{},[297],{"type":29,"value":298},"Mixup training",{"type":29,"value":300}," can be regarded as a data enhancement trick aimed at regularizing convolutional networks. We find it works well in long-tailed recognition, especially when combined with re-sampling.",{"type":19,"tag":33,"props":302,"children":303},{},[304],{"type":29,"value":305},"l Input mixup",{"type":19,"tag":33,"props":307,"children":308},{},[309],{"type":29,"value":310},"The two original input images are linearly added at the pixel level. After the combination, the image labels are also linearly added by the original labels.",{"type":19,"tag":33,"props":312,"children":313},{},[314],{"type":29,"value":315},"l Manifold Mixup (MM)",{"type":19,"tag":33,"props":317,"children":318},{},[319],{"type":29,"value":320},"Performs mixup on the feature map of some network layers to encourage the neural network to predict hidden representations more conservatively.",{"type":19,"tag":33,"props":322,"children":323},{},[324],{"type":29,"value":325},"l Fine-tuning after mixup training",{"type":19,"tag":33,"props":327,"children":328},{},[329],{"type":29,"value":330},"Fine-tunes several epochs of the model using input mixup training.",{"type":19,"tag":33,"props":332,"children":333},{},[334],{"type":19,"tag":42,"props":335,"children":336},{},[337],{"type":29,"value":338},"Two-Phase Training",{"type":19,"tag":33,"props":340,"children":341},{},[342],{"type":29,"value":343},"The training process is divided into two stages: unbalanced training and balanced fine-tuning. This section focuses on different methods of balanced fine-tuning and proposes a sampling method based on CAM.",{"type":19,"tag":33,"props":345,"children":346},{},[347],{"type":29,"value":348},"l Deferred rebalancing by re-sampling (DRS)",{"type":19,"tag":33,"props":350,"children":351},{},[352],{"type":29,"value":353},"Uses the normal training method first, and then the balanced sampling method for fine-tuning in the second phase.",{"type":19,"tag":33,"props":355,"children":356},{},[357],{"type":29,"value":358},"l Deferred rebalancing by re-weighting (DRW)",{"type":19,"tag":33,"props":360,"children":361},{},[362],{"type":29,"value":363},"Uses the re-weighting method in the second phase.",{"type":19,"tag":33,"props":365,"children":366},{},[367],{"type":29,"value":368},"l CAM-Based Sampling (CAM-BS)",{"type":19,"tag":33,"props":370,"children":371},{},[372],{"type":29,"value":373},"The re-sampling method used in DRS only copies or deletes the randomly selected samples from the original dataset to generate a balanced subset, so improvement in the balanced fine-tuning process is limited.",{"type":19,"tag":33,"props":375,"children":376},{},[377],{"type":29,"value":378},"To generate discriminative information, we propose a CAM-based sampling method.",{"type":19,"tag":33,"props":380,"children":381},{},[382],{"type":29,"value":383},"As shown in the following figure, we apply re-sampling to obtain a balanced sampled image first. For each sampled image, a CAM is generated based on the label of the parameterized model trained in the first training phase and the weight of the corresponding fully connected layer. The foreground is separated from the background based on the average value of the CAM. The foreground contains pixels greater than the average value, and the background contains other pixels.",{"type":19,"tag":33,"props":385,"children":386},{},[387],{"type":29,"value":388},"Finally, we apply transformations to the foreground while keeping the background unchanged. Transformations (implemented by MindSpore) include horizontal flipping, translation, rotating, and scaling. We randomly select a transformation for each image.",{"type":19,"tag":33,"props":390,"children":391},{},[392],{"type":19,"tag":120,"props":393,"children":395},{"alt":7,"src":394},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/4b0ac88c4e36418e858456b684cc771e.png",[],{"type":19,"tag":33,"props":397,"children":398},{},[399],{"type":19,"tag":42,"props":400,"children":401},{},[402],{"type":29,"value":403},"6. Experiment Result",{"type":19,"tag":33,"props":405,"children":406},{},[407,412],{"type":19,"tag":42,"props":408,"children":409},{},[410],{"type":29,"value":411},"Re-weighting method",{"type":29,"value":413},": The following table lists the error rates of the five methods. The results show that using the re-weighting strategy alone is inappropriate, especially when the number of classes increases.",{"type":19,"tag":33,"props":415,"children":416},{},[417],{"type":19,"tag":120,"props":418,"children":420},{"alt":7,"src":419},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/e8b42b038822414ca0e5aae1fcd9cd66.png",[],{"type":19,"tag":33,"props":422,"children":423},{},[424,429],{"type":19,"tag":42,"props":425,"children":426},{},[427],{"type":29,"value":428},"Re-sampling method",{"type":29,"value":430},": The following figure shows the error rates of different re-sampling methods. Applying re-sampling alone can only achieve slight improvements.",{"type":19,"tag":33,"props":432,"children":433},{},[434],{"type":19,"tag":120,"props":435,"children":437},{"alt":7,"src":436},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/8b645e4e1251449799d443e0c9dec815.png",[],{"type":19,"tag":33,"props":439,"children":440},{},[441,445,447,452],{"type":19,"tag":42,"props":442,"children":443},{},[444],{"type":29,"value":298},{"type":29,"value":446},": Results are listed in the following table. ",{"type":19,"tag":42,"props":448,"children":449},{},[450],{"type":29,"value":451},"ft.",{"type":29,"value":453}," indicates that two-phase balanced fine-tuning is used after mixup training. We can find that input mixup catches up with MM, in which mixup at different layers has limited impact on results, and input mixup is more helpful for subsequent fine-tuning.",{"type":19,"tag":33,"props":455,"children":456},{},[457],{"type":19,"tag":120,"props":458,"children":460},{"alt":7,"src":459},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/10b1f1054121410690539e3a8b6d7952.png",[],{"type":19,"tag":33,"props":462,"children":463},{},[464,469],{"type":19,"tag":42,"props":465,"children":466},{},[467],{"type":29,"value":468},"Two-phase training",{"type":29,"value":470},": The following table lists the top-1 error rates of different re-sampling methods in DRS. The best result is obtained by CAM-based balanced sampling.",{"type":19,"tag":33,"props":472,"children":473},{},[474],{"type":19,"tag":120,"props":475,"children":477},{"alt":7,"src":476},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/fb8ef53127d7429395807d4233f76a37.png",[],{"type":19,"tag":33,"props":479,"children":480},{},[481],{"type":29,"value":482},"The following table lists the top-1 error rates of different weighting methods in DRW. CS_CE achieves the best results in the DRW training.",{"type":19,"tag":33,"props":484,"children":485},{},[486],{"type":19,"tag":120,"props":487,"children":489},{"alt":7,"src":488},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/3cdf545e54294c2d9f7f33eda333c625.png",[],{"type":19,"tag":33,"props":491,"children":492},{},[493],{"type":29,"value":494},"The following table lists the top-1 error rates of the combination between mixup training and other best tricks. It is obvious that input mixup achieves greater improvement than MM.",{"type":19,"tag":33,"props":496,"children":497},{},[498],{"type":19,"tag":120,"props":499,"children":501},{"alt":7,"src":500},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/05/554864961d0b42d2b74c724d91d8ea4e.png",[],{"type":19,"tag":33,"props":503,"children":504},{},[505],{"type":29,"value":506},"The following table lists the top-1 error rates after the optimal training strategies are combined. As the training skills increase, the performance improves steadily, demonstrating the effectiveness of our method on small and large real-world datasets.",{"type":19,"tag":33,"props":508,"children":509},{},[510],{"type":19,"tag":42,"props":511,"children":512},{},[513],{"type":29,"value":514},"7. MindSpore Code Implementation",{"type":19,"tag":516,"props":517,"children":519},"pre",{"code":518},"def construct(self, logit, label, **kwargs):\n\n\n\n\"\"\"\n\n\n\nArgs:\n\n\n\ninputs: prediction matrix (before softmax) with shape (batch_size, num_classes)\n\n\n\nlabel: ground truth labels with shape (batch_size)\n\n\n\n\"\"\"\n\n\n\nlogit_max = self.max(logit, -1)\n\n\n\nexp = self.exp(self.sub(logit, logit_max))\n\n\n\nexp_sum = self.sum(exp, -1)\n\n\n\nsoftmax_result = self.div(exp, exp_sum)\n\n\n\nlabel = self.onehot(label, ops.shape(logit)[1], self.on_value, self.off_value)\n\n\n\nsoftmax_result_log = self.log(softmax_result)\n\n\n\nloss = self.sum_cross_entropy((self.mul(softmax_result_log, label)), -1)\n\n\n\nloss = self.mul2(ops.scalar_to_array(-1.0), loss)\n\n\n\nif self.weight_list is not None:\n\n\n\nweight = self.mul3(self.squeeze(self.weight_list), label)\n\n\n\nweight = self.sum2(weight, -1)\n\n\n\nloss = self.mul3(loss, weight)\n\n\n\nloss = self.mean(loss, -1)\n\nreturn loss\n",[520],{"type":19,"tag":521,"props":522,"children":523},"code",{"__ignoreMap":7},[524],{"type":29,"value":518},{"type":19,"tag":33,"props":526,"children":527},{},[528],{"type":29,"value":529},"The operator granularity is refined to enhance the algorithm performance when multiple cards exist.",{"type":19,"tag":33,"props":531,"children":532},{},[533],{"type":19,"tag":42,"props":534,"children":535},{},[536],{"type":29,"value":537},"8. Conclusion",{"type":19,"tag":33,"props":539,"children":540},{},[541],{"type":29,"value":542},"By systematically applying simple and effective long-tailed recognition methods, this paper provides helpful training guidance for long-tailed visual recognition.",{"type":19,"tag":33,"props":544,"children":545},{},[546],{"type":29,"value":547},"In addition, we find that the existing simple sampling methods lack discriminative information. Therefore, we propose a CAM-based data augmentation method and combine it with the re-sampling method.",{"type":19,"tag":33,"props":549,"children":550},{},[551],{"type":29,"value":552},"After a large number of experiments, we obtain the optimal combination of training tricks and achieve the best results in the long-tailed benchmark test. We also release the source code as a practical toolbox, which may provide useful reference for future research on long-tailed visual recognition.",{"type":19,"tag":33,"props":554,"children":555},{},[556],{"type":29,"value":10},{"title":7,"searchDepth":558,"depth":558,"links":559},4,[],"markdown","content:technology-blogs:en:1765.md","content","technology-blogs/en/1765.md","technology-blogs/en/1765","md",1776506104272]