Orange Pi Developer Board Offline Inference Practice Based on the MindSpore Case - Image and Text Recognition

2024/06/29

Practices

Orange Pi Developer Board Offline Inference Practice Based on the MindSpore Case - Image and Text Recognition

Introduction

Text recognition refers to recognizing text from an image and converting the text area in the image into character information. Generally, a convolutional neural network (CNN) is used to extract rich feature information from the image, and then recognition is performed based on the extracted feature information. Here, ResNet is used as the feature extraction network, and the connectionist temporal classification (CTC) method is used for recognition. This script is used to convert the CKPT file of the CNNCTC model into an AIR file first, and then into an OM file, and perform offline inference.

Preparations

· The sample directory of the base image has already contained the converted OM model and test images. If you want to run the model directly, skip this section. If you want to convert the model again, this section describes the detailed procedure.

· You are advised to convert the model on a Linux server or VM.

· To further optimize the model inference performance, convert the model into an OM model using the following conversion command:

o atc --model=cnnctc.air --output="cnnctc" --framework=1 --soc_version=Ascend310B4 --output_type=FP32 --precision_mode=allow_fp32_to_fp16 --log=info

· Detailed conversion parameter descriptions are as follows:

· --model: input model path

· --framework: framework type of the original network model. The value 1 indicates AIR, and the value 5 indicates ONNX.

· --output: output model path

· --log: the log level

· --soc_version: Ascend AI Processor model

!sh env.sh # Run the code in the cell.

Model Inference Implementation

After cnnctc.om is obtained, run the offline inference code and load the inference image predict.png.

1. Importing the third-party libraries

import os

import time
import argparse

import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

from acllite_model import AclLiteModel as Model
from acllite_resource import AclLiteResource as AclResource

2. Importing and processing the model

# Obtain the OM file of the model.
from download import download
model_url = "https://mindspore-courses.obs.cn-north-4.myhuaweicloud.com/orange-pi-mindspore/02-CNNCTC/cnnctc.zip"
download(model_url, "./", kind="zip", replace=True)

# Query the OM model and image positions.
MODEL_PATH = './cnnctc.om'
IMAGE_PATH = './predict.png'

# Initialize ACL resources.
acl_resource = AclResource()
acl_resource.init()

# Import the local OM model.
print('load model....')
model = Model(MODEL_PATH)
print('load model finished....')

# Encode text and data.
class CTCLabelConverter():
    def __init__(self, character):
        dict_character = list(character)
        self.dict = {}
        for i, char in enumerate(dict_character):
            self.dict[char] = i + 1
        self.character = ['[blank]'] + dict_character
        self.dict['[blank]'] = 0

    # Convert text to data encoding.
    def encode(self, text):
        length = [len(s) for s in text]
        text = ''.join(text)
        text = [self.dict[char] for char in text]

        return np.array(text), np.array(length)

    # Convert data encoding to text.
    def decode(self, text_index, length):
        texts = []
        index = 0
        for l in length:
            t = text_index[index:index + l]
            char_list = []
            for i in range(l):
                if t[i] != self.dict['[blank]'] and (
                        not (i > 0 and t[i - 1] == t[i])):
                    char_list.append(self.character[t[i]])
            text = ''.join(char_list)
            texts.append(text)
            index += l
        return texts

3. Performing the inference

# Import and process target images.
img_PIL = Image.open(IMAGE_PATH).convert('RGB')
img = img_PIL.resize((100, 32), resample=3)
img = np.array(img, dtype=np.float32)
img = np.expand_dims(img, axis=0) 
img = np.transpose(img, [0, 3, 1, 2]) 

# Define the inference time.
start = time.time()
model_predict = model.execute([img])[0]
end = time.time()
print(f'infer use time:{(end-start)*1000}ms')

# Initialize the text encoding function.
character = '0123456789abcdefghijklmnopqrstuvwxyz'
converter = CTCLabelConverter(character)

# Inference process
preds_size = np.array([model_predict.shape[1]])
preds_index = np.argmax(model_predict, 2)
preds_index = np.reshape(preds_index, [-1])
preds_str = converter.decode(preds_index, preds_size)
print('Predict: ', preds_str)

Summary

The preceding is the running result of the offline inference of the CNNCTC text recognition sample. The final verification result shows that the word "PARKING" in the sample image is successfully recognized. Note:

· If the inference fails, set environment variables as the root user (by running or referring to the env.sh file in the folder).

· Perform inference again to clear all caches.

Learning

Core Frameworks

Foundation Model

Scientific Computing

Domain Suites

Tools

Ecosystem

Technical learning

Community Organization

Contribution and Growth

Interaction and Communication

Events

News

Orange Pi Developer Board Offline Inference Practice Based on the MindSpore Case - Image and Text Recognition

Orange Pi Developer Board Offline Inference Practice Based on the MindSpore Case - Image and Text Recognition