Class Margin Equilibrium for Few-shot Object Detection

2022/04/08

Class Margin Equilibrium for Few-shot Object Detection

1. Background

In the past few years, we have witnessed great progress in visual object detection. This is attributed to the availability of large-scale datasets with precise annotations and convolutional neural networks (CNNs) capable of absorbing the annotation information. However, annotating a large number of objects is expensive and laborious. It is also not consistent with cognitive learning, which can build a precise model using few-shot supervisions

Few-shot detection, which simulates the way that human learns, has attracted increasing attention. Given base classes of sufficient training data and novel classes of few supervisions, few-shot detection trains a model to simultaneously detect objects from both base and novel classes. To this end, a majority of works divided the training procedure to two stages: base class training (representation learning) and novel class reconstruction (meta training). In representation learning, the sufficient training data of base classes is used to train a network and constructs a representative feature space. In meta training, the network is fine-tuned so that the novel class objects can be represented within the feature space.

Despite the substantial progress, the implicit contradiction between representation and classification is unfortunately ignored. To separate the classes, the distributions of two base classes are required to be far away from each other (max-margin), which however aggregates the diversity of novel classes. To precisely represent novel classes, the distributions of base classes should be close to each other (min-margin), which causes the difficult of classification. How to simultaneously optimize novel class representation and classification in the same feature space remains to be elaborated.

2. Team Introduction

The team is led by Dr. Qixiang Ye, recipient of the Chinese Association for Artificial Intelligence – Huawei MindSpore Academic Award Fund and full professor with the University of Chinese Academy of Sciences. He was a visiting assistant professor with the Institute of Advanced Computer Studies (UMIACS), University of Maryland, College Park until 2013 and a visiting scholar of Duke University EECS in 2016. His research interests include visual object detection and machine learning, especially for feature representation learning, weakly supervised learning, self-supervised learning for visual object sensing. With more than 180 papers published in referred conferences and journals including IEEE T-ITS, TIP, TNN, T-PAMI, CVPR, ICCV, ECCV, AAAI, and NeurIPS. Dr. Ye received the Sony Outstanding Paper Award and the LuJiaXi Young Researcher Award. He is an SPC of IJCAI 2020 and 2021 and on the editor board of IEEE Transactions on Intelligent Transportation System and IEEE Transactions on Circuit and System on Video Technology.

3. Introduction to the Paper

In this paper, the team proposed a class margin equilibrium (CME) approach, with the aim to optimize feature space partition for few-shot object detection with adversarial class margin regularization. For the object detection task, CME first introduces a fully connected layer to decouple localization features which could mislead class margins in the feature space. CME then pursues a margin equilibrium to comprise representation learning and feature reconstruction.

Specifically, during base training, CME constructs a feature space where the margins between novel classes are maximized by introducing class margin loss. During network fine-tuning, CME introduces a feature disturbance module by truncating gradient maps. With multiple training iterations, class margins are regularized in an adversarial min-max fashion towards margin equilibrium, which facilities both feature reconstruction and object classification in the same feature space.

4. Related Links

Paper:

https://openaccess.thecvf.com/content/CVPR2021/html/Li_Beyond_Max-Margin_Class_Margin_Equilibrium_for_Few-Shot_Object_Detection_CVPR_2021_paper.html

Code implementation based on MindSpore:

https://gitee.com/mindspore/contrib/tree/master/papers/CME

5. Technical Highlights of the Algorithm Framework

The contributions of this paper include:

(1) Unveiling the representation-classification constriction hidden in few-shot object detection, and proposing a feasible way to alleviate the constriction from the perspective of class margin equilibrium (CME).

(2) A max-margin loss and feature disturbance module to implement class margin equilibrium in an adversarial min-max fashion.

(3) Converting the few-shot detection problem to a few-shot classification problem by filtering out localization features, improving the state-of-the-art with significant margins upon both one-stage and two-stage baseline detectors.

6. Test Results

The proposed CME approach for few-shot object detection is evaluated on Pascal VOC 2007, VOC 2012 and MS COCO, following the settings in Meta YOLO and MPSR.

In the preceding figure, CME is compared with the one-stage few-shot detectors based on the YOLO detector, including LSTD, Meta YOLO, and MetaDet. The proposed CME detector demonstrates great advantages over the compared detectors. Specifically, for Novel Set 1, CME respectively achieves 0.7% (17.8% vs. 17.1%) on 1-shot setting, 7.0% (26.1% vs. 19.1%) on 2-shot setting, 2.6% (31.5% vs. 28.9%) on 3-shot setting, 9.8% (44.8% vs. 35.0%) on 5-shot setting. The average improvement is 3.8%, which is a significant margin for few-shot object detection tasks. The average performance improvements are respectively 0.7% for novel set 3.

The proposed approach is also compared with two-stage detectors including MetaDet, Meta RCNN, TFA Viewpoint Estimation, and MPSR, which are based on the Faster-RCNN framework. In most settings, CME outperforms the compared detectors. For novel set 1, CME respectively outperforms by 5% (47.5% vs. 42.5%) on 2-shot setting, 6% (58.2% vs. 52.2%) on 5-shot setting. The average improvement reaches 1.2%. The average performance improvement for novel set 2 is 1.5% and 0.3% for novel set 3.

7. MindSpore Code Implementation

8. Conclusion

The team proposed a class margin equilibrium (CME) approach to optimize both feature space partition and novel class representation for few-shot object detection. During base training, CME preserves adequate margin space for novel classes by a simple-yet-effective class margin loss. During fine-tuning, CME pursues margin equilibrium by disturbing the instance features of novel classes in an adversarial min-max fashion. Extensive experiments validated the effectiveness of CME for alleviating the constriction of feature representation and classification in few-shot settings. As a plug-and-play module, CME improved both one-stage and two-stage few-shot detectors, in striking contrast to the state-of-the-arts. As a general method for feature representation learning and class margin optimization, CME provides a fresh insight for few-shot learning problems.