[{"data":1,"prerenderedAt":309},["ShallowReactive",2],{"content-query-aKUdD6LgV4":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"cover":11,"type":12,"category":13,"body":14,"_type":303,"_id":304,"_source":305,"_file":306,"_stem":307,"_extension":308},"/technology-blogs/en/2744","en",false,"","MindSpore Case Study | Running YOLOv5 on Raspberry Pi for Real-Time Object Detection (3)","In the previous blog, we used Python 3.7 for inference, but a graph build error is reported. As a result, we then used MindSpore APIs for MindIR inference.","2023-07-23","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/09/15/64d13152cd874986a8c9f37df5228db9.png","technology-blogs","Practices",{"type":15,"children":16,"toc":300},"root",[17,25,30,40,45,50,55,60,65,73,107,115,123,136,141,177,185,190,198,203,225,233,252,260,272,283,295],{"type":18,"tag":19,"props":20,"children":22},"element","h1",{"id":21},"mindspore-case-study-running-yolov5-on-raspberry-pi-for-real-time-object-detection-3",[23],{"type":24,"value":8},"text",{"type":18,"tag":26,"props":27,"children":28},"p",{},[29],{"type":24,"value":9},{"type":18,"tag":31,"props":32,"children":34},"pre",{"code":33},"import numpy as np\nimport cv2 as cv\nimport time\nimport numpy as np\nimport mindspore as ms\nimport mindspore.nn as nn\nfrom mindspore import Tensor\nfrom mindspore import context\ncontext.set_context(mode=context.GRAPH_MODE)\ngraph = ms.load(\"yolov5s.mindir\")\nnet = nn.GraphCell(graph)\nin_data = cv.imread(\"C:\\\\ai\\\\infer\\\\data\\\\images\\\\dog.jpg\")\nimg = cv.resize(in_data, (640,640), interpolation=cv.INTER_LINEAR)\nimage_np_expanded = img.astype('float32') / 255.0\ninput_tensor = Tensor(image_np_expanded)\nprint(input_tensor.shape)\nimg_dim=in_data.shape[:2]\noutput = net(input_tensor)\nprint(output[0].shape)\nprint(output[1].shape)\nprint(output[2].shape)\n",[35],{"type":18,"tag":36,"props":37,"children":38},"code",{"__ignoreMap":7},[39],{"type":24,"value":33},{"type":18,"tag":26,"props":41,"children":42},{},[43],{"type":24,"value":44},"The shapes of the inference outputs are as follows:",{"type":18,"tag":26,"props":46,"children":47},{},[48],{"type":24,"value":49},"(1, 20, 20, 3, 85)",{"type":18,"tag":26,"props":51,"children":52},{},[53],{"type":24,"value":54},"(1, 40, 40, 3, 85)",{"type":18,"tag":26,"props":56,"children":57},{},[58],{"type":24,"value":59},"(1, 80, 80, 3, 85)",{"type":18,"tag":26,"props":61,"children":62},{},[63],{"type":24,"value":64},"The corresponding outputs are shown in the following figure, but there are slight differences, which is caused by different shapes of the input image.",{"type":18,"tag":26,"props":66,"children":67},{},[68],{"type":18,"tag":69,"props":70,"children":72},"img",{"alt":7,"src":71},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/09/13/48a95933b6a643369567db534ed25360.png",[],{"type":18,"tag":26,"props":74,"children":75},{},[76,78,84,86,91,93,98,100,105],{"type":24,"value":77},"In the three feature maps in MindSpore, the largest shape is ",{"type":18,"tag":79,"props":80,"children":81},"strong",{},[82],{"type":24,"value":83},"80 x 80",{"type":24,"value":85},", indicating small object detection. For a 640 x 640 image, the receptive field of each pixel in the feature map is ",{"type":18,"tag":79,"props":87,"children":88},{},[89],{"type":24,"value":90},"640/80 = 8 x 8.",{"type":24,"value":92}," ",{"type":18,"tag":79,"props":94,"children":95},{},[96],{"type":24,"value":97},"40 x 40",{"type":24,"value":99}," indicates medium object detection and ",{"type":18,"tag":79,"props":101,"children":102},{},[103],{"type":24,"value":104},"20 x 20",{"type":24,"value":106}," indicates large object detection.",{"type":18,"tag":26,"props":108,"children":109},{},[110],{"type":18,"tag":79,"props":111,"children":112},{},[113],{"type":24,"value":114},"Introduction to Output Object Border, Confidence, and Object Category",{"type":18,"tag":26,"props":116,"children":117},{},[118],{"type":18,"tag":79,"props":119,"children":120},{},[121],{"type":24,"value":122},"1. Output Object Border",{"type":18,"tag":26,"props":124,"children":125},{},[126,128,134],{"type":24,"value":127},"Each layer of a feature map is convolved by 1 x 1 and generates multiple channels ((5 + number of categories) x 3). Additionally, each layer has three anchors, and each anchor corresponds to (5 + number of categories) channels. If the number of categories is 2, there are 7 channels in total, corresponding to ",{"type":18,"tag":129,"props":130,"children":131},"em",{},[132],{"type":24,"value":133},"xywh",{"type":24,"value":135}," (four channels), confidence (one channel), and categories (two channels for two categories.)",{"type":18,"tag":26,"props":137,"children":138},{},[139],{"type":24,"value":140},"Corresponding to the preceding output, the number of categories is 80, so the shape is 85.",{"type":18,"tag":26,"props":142,"children":143},{},[144,148,150,155,157,162,164,169,170,175],{"type":18,"tag":129,"props":145,"children":146},{},[147],{"type":24,"value":133},{"type":24,"value":149}," indicates object borders. However, ",{"type":18,"tag":129,"props":151,"children":152},{},[153],{"type":24,"value":154},"x",{"type":24,"value":156}," and ",{"type":18,"tag":129,"props":158,"children":159},{},[160],{"type":24,"value":161},"y",{"type":24,"value":163}," are not directly the coordinates of the center point of object borders, but the offsets relative to the upper left corner of the grid where object borders are located. Then, the relative value of the center point is mapped to the scale of the original image to obtain the final coordinates. Similarly, ",{"type":18,"tag":129,"props":165,"children":166},{},[167],{"type":24,"value":168},"w",{"type":24,"value":156},{"type":18,"tag":129,"props":171,"children":172},{},[173],{"type":24,"value":174},"h",{"type":24,"value":176}," are not width and height that directly predict object borders. Instead, the prediction is performed based on the anchor. The final width and height are equal to the predicted values multiplied by the width and height of the anchor.",{"type":18,"tag":26,"props":178,"children":179},{},[180],{"type":18,"tag":79,"props":181,"children":182},{},[183],{"type":24,"value":184},"2. Confidence",{"type":18,"tag":26,"props":186,"children":187},{},[188],{"type":24,"value":189},"It indicates the confidence of the predicted object border. In the inference script, the final confidence is multiplied by the maximum category score.",{"type":18,"tag":26,"props":191,"children":192},{},[193],{"type":18,"tag":79,"props":194,"children":195},{},[196],{"type":24,"value":197},"3. Category",{"type":18,"tag":26,"props":199,"children":200},{},[201],{"type":24,"value":202},"The number of additional channels is the same as the number of categories. Each additional channel indicates the probability of the corresponding category. During loss calculation, the value of the channel where the category corresponding to the tag is located is 1, and the values of other channels are 0. Then, the BCE loss can be calculated separately.",{"type":18,"tag":26,"props":204,"children":205},{},[206,208,213,215,223],{"type":24,"value":207},"The next step is to process the output data. We can directly use APIs presented in ",{"type":18,"tag":79,"props":209,"children":210},{},[211],{"type":24,"value":212},"process.py",{"type":24,"value":214},". For details, visit ",{"type":18,"tag":216,"props":217,"children":221},"a",{"href":218,"rel":219},"https://gitee.com/mindspore/models/blob/master/official/cv/YOLOv5/infer/sdk/api-server/postprocess.py",[220],"nofollow",[222],{"type":24,"value":218},{"type":24,"value":224},".",{"type":18,"tag":31,"props":226,"children":228},{"code":227},"img_dim = in_data.shape[:2]\nfrom model_utils.config import config\nfrom src.logger import get_logger\nfrom postprocess import DetectionEngine\nconfig.logger = get_logger(config.output_dir, 0)\n# init detection engine\ndetection = DetectionEngine(config)\ndetection.detect(output,1,img_dim,139)\nboxes=detection.do_nms_for_results()\nprint(boxes)\n\n\nclass DetectionEngine:\n    \"\"\"Detection engine.\"\"\"\n\n    def __init__(self, args_detection):\n        self.ignore_threshold = args_detection.ignore_threshold\n        self.args = args_detection\n        self.labels = ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat',\n                       'traffic light', 'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat',\n                       'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack',\n                       'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',\n                       'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',\n                       'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',\n                       'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair',\n                       'couch', 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',\n                       'keyboard', 'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book',\n                       'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']\n        self.num_classes = len(self.labels)\n        self.results = {}\n        self.file_path = ''\n        self.ann_file = args_detection.val_ann_file\n        self._coco = COCO(self.ann_file)\n        self._img_ids = list(sorted(self._coco.imgs.keys()))\n        self.det_boxes = []\n        self.nms_thresh = args_detection.eval_nms_thresh\n        self.multi_label = args_detection.multi_label\n        self.multi_label_thresh = args_detection.multi_label_thresh\n        self.coco_catIds = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27,\n                            28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53,\n                            54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80,\n                            81, 82, 84, 85, 86, 87, 88, 89, 90]\n\n# Some parameters under DetectionEngine need to be configured. Therefore, we can directly use the parameters in model_utils.config.\n",[229],{"type":18,"tag":36,"props":230,"children":231},{"__ignoreMap":7},[232],{"type":24,"value":227},{"type":18,"tag":26,"props":234,"children":235},{},[236,238,243,245,250],{"type":24,"value":237},"After modifying parameters under ",{"type":18,"tag":79,"props":239,"children":240},{},[241],{"type":24,"value":242},"DetectionEngine",{"type":24,"value":244},", we need to use the ",{"type":18,"tag":79,"props":246,"children":247},{},[248],{"type":24,"value":249},"detect",{"type":24,"value":251}," method for batch data processing. Since only one image is used as an example, the code can be modified as follows.",{"type":18,"tag":31,"props":253,"children":255},{"code":254},"#ori_w, ori_h = img_shape[batch_id]\nori_w, ori_h = img_shape\n#img_id = int(image_id[batch_id])\nimg_id = int(image_id)\n",[256],{"type":18,"tag":36,"props":257,"children":258},{"__ignoreMap":7},[259],{"type":24,"value":254},{"type":18,"tag":26,"props":261,"children":262},{},[263,267,268],{"type":18,"tag":69,"props":264,"children":266},{"alt":7,"src":265},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/09/13/6e39b0275f884d40b0f6d1a4f8a185a6.png",[],{"type":24,"value":92},{"type":18,"tag":69,"props":269,"children":271},{"alt":7,"src":270},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2023/09/13/713aa661c6204d2e8dae908e8c9bafe3.png",[],{"type":18,"tag":26,"props":273,"children":274},{},[275,277,281],{"type":24,"value":276},"However, after code debugging, the value obtained by ",{"type":18,"tag":79,"props":278,"children":279},{},[280],{"type":24,"value":249},{"type":24,"value":282}," is null. This may be caused by model weights. After all, weight files may not be consistent.",{"type":18,"tag":26,"props":284,"children":285},{},[286,288,294],{"type":24,"value":287},"To address this problem, we may use the MindSpore YOLO suite (MindYOLO), which was released in MindSpore 2.0. MindYOLO is a MindSpore-based algorithm suite for YOLO models. It incorporates various YOLO algorithm modules and provides common module interfaces (such as data processing, model building, and optimizer) of YOLO algorithms, simplifying model building and training processes. Currently, six basic models, including YOLOv3/v4/v5/v7/v8/X, are provided for quick reproduction and migration. For details about the code, visit ",{"type":18,"tag":216,"props":289,"children":292},{"href":290,"rel":291},"https://github.com/mindspore-lab/mindyolo",[220],[293],{"type":24,"value":290},{"type":24,"value":224},{"type":18,"tag":26,"props":296,"children":297},{},[298],{"type":24,"value":299},"To be continued...",{"title":7,"searchDepth":301,"depth":301,"links":302},4,[],"markdown","content:technology-blogs:en:2744.md","content","technology-blogs/en/2744.md","technology-blogs/en/2744","md",1776506107184]