# Evaluating the Robustness of the OCR Model CNN-CTC [![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/docs/mindarmour/docs/source_en/evaluation_of_CNNCTC.md) ## Overview This tutorial uses natural perturbation serving to evaluate the robustness of the OCR model, CNN-CTC. Multiple natural perturbation sample datasets are generated based on serving, and then the robustness of the CNN-CTC model is evaluated based on the model performance on the natural perturbation sample datasets. > You can obtain the complete executable sample code at . ## Environment Requirements - Hardware - Prepare the Ascend/GPU processor to set up the hardware environment. - Dependencies - [MindSpore](https://www.mindspore.cn/install/en) - MindSpore Serving 1.6.0 or later - MindArmour ## Script Description ### Code Structure ```bash |-- natural_robustness |-- serving # Serving for generating natural perturbation samples. |-- ocr_evaluate |-- cnn_ctc # Directory for model training, inference, pre-processing, and post-processing. |-- data # Experiment analysis data. |-- default_config.yaml # Parameter configuration. |-- generate_adv_samples.py # Script for generating natural perturbation samples. |-- eval_and_save.py # Script for CNN-CTC to perform inference on the perturbation sample and save the inference result. |-- analyse.py # Script for analyzing the robustness of the CNN-CTC model. ``` ### Script Parameters You can configure training parameters, inference parameters, and robustness evaluation parameters in `default_config.yaml`. We focus on the parameters used in the evaluation and the parameters that need to be configured by users. For details about other parameters, see the [CNN-CTC Tutorial](https://gitee.com/mindspore/models/tree/master/research/cv/cnnctc). - `--TEST_DATASET_PATH`: indicates the path of the test dataset. - `--CHECKPOINT_PATH`: indicates the checkpoint path. - `--ADV_TEST_DATASET_PATH`: indicates the path of the perturbation sample dataset. - `--IS_ADV`: specifies whether to use the perturbation sample for the test. ### Model and Data The model to be evaluated is the CNN-CTC model implemented based on MindSpore. This model is mainly used for scene text recognition tasks. It uses a CNN model to extract features and uses the connectionist temporal classification (CTC) to predict the output sequence. [Paper](https://arxiv.org/abs/1904.01906): J. Baek, G. Kim, J. Lee, S. Park, D. Han, S. Yun, S. J. Oh, and H. Lee, "What is wrong with scene text recognition model comparisons? dataset and model analysis," ArXiv, vol. abs/1904.01906, 2019. For details about data processing and model training, see the [CNN-CTC Tutorial](https://gitee.com/mindspore/models/tree/master/research/cv/cnnctc). You need to obtain the preprocessed dataset and checkpoint model file based on this tutorial for the following evaluation task. The preprocessed dataset is in .lmdb format and stored in key-value pairs. - label- %09d: actual image label - image- %09d: original image data - num-samples: number of samples in the LMDB dataset In the preceding information, `%09d` indicates a string of 9 digits. Example: label-000000001. ### Generating an Evaluation Dataset based on Natural Perturbation Serving 1. Start the serving server for generating natural perturbation samples. For details, see [Generating Natural Perturbation Samples Based on the Serving Server](https://gitee.com/mindspore/mindarmour/blob/master/examples/natural_robustness/serving/README.md#). ```bash cd serving/server/ python serving_server.py ``` 2. Generate an evaluation dataset based on the serving server. 1. In `default_config.yaml`, configure the original test sample data path `TEST_DATASET_PATH` and the path to store the generated perturbation sample dataset `ADV_TEST_DATASET_PATH`. Example: ```yaml TEST_DATASET_PATH: "/opt/dataset/CNNCTC_data/MJ-ST-IIIT/IIIT5k_3000" ADV_TEST_DATASET_PATH: "/home/mindarmour/examples/natural_robustness/ocr_evaluate/data" ``` 2. The core code is described as follows: 1. Configure the perturbation method. For details about the available perturbation methods and parameter configurations, see [transform/image](https://gitee.com/mindspore/mindarmour/tree/master/mindarmour/natural_robustness/transform/image). The following is a configuration example. ```python PerturbConfig = [ {"method": "Contrast", "params": {"alpha": 1.5, "beta": 0}}, {"method": "GaussianBlur", "params": {"ksize": 5}}, {"method": "SaltAndPepperNoise", "params": {"factor": 0.05}}, {"method": "Translate", "params": {"x_bias": 0.1, "y_bias": -0.1}}, {"method": "Scale", "params": {"factor_x": 0.8, "factor_y": 0.8}}, {"method": "Shear", "params": {"factor": 1.5, "direction": "horizontal"}}, {"method": "Rotate", "params": {"angle": 30}}, {"method": "MotionBlur", "params": {"degree": 5, "angle": 45}}, {"method": "GradientBlur", "params": {"point": [50, 100], "kernel_num": 3, "center": True}}, {"method": "GradientLuminance", "params": {"color_start": [255, 255, 255], "color_end": [0, 0, 0], "start_point": [100, 150], "scope": 0.3, "bright_rate": 0.3, "pattern": "light", "mode": "circle"}}, {"method": "GradientLuminance", "params": {"color_start": [255, 255, 255], "color_end": [0, 0, 0], "start_point": [150, 200], "scope": 0.3, "pattern": "light", "mode": "horizontal"}}, {"method": "GradientLuminance", "params": {"color_start": [255, 255, 255], "color_end": [0, 0, 0], "start_point": [150, 200], "scope": 0.3, "pattern": "light", "mode": "vertical"}}, {"method": "Curve", "params": {"curves": 0.5, "depth": 3, "mode": "vertical"}}, {"method": "Perspective", "params": {"ori_pos": [[0, 0], [0, 800], [800, 0], [800, 800]], "dst_pos": [[10, 0], [0, 800], [790, 0], [800, 800]]}}, ] ``` 2. Prepare the data to be perturbed. ```python instances = [] methods_number = 1 outputs_number = 2 perturb_config = json.dumps(perturb_config) env = lmdb.open(lmdb_paths, max_readers=32, readonly=True, lock=False, readahead=False, meminit=False) if not env: print('cannot create lmdb from %s' % (lmdb_paths)) sys.exit(0) with env.begin(write=False) as txn: n_samples = int(txn.get('num-samples'.encode())) # Filtering filtered_labels = [] filtered_index_list = [] for index in range(n_samples): index += 1 # lmdb starts with 1 label_key = 'label-%09d'.encode() % index label = txn.get(label_key).decode('utf-8') if len(label) > max_len: continue illegal_sample = False for char_item in label.lower(): if char_item not in config.CHARACTER: illegal_sample = True break if illegal_sample: continue filtered_labels.append(label) filtered_index_list.append(index) img_key = 'image-%09d'.encode() % index imgbuf = txn.get(img_key) instances.append({"img": imgbuf, 'perturb_config': perturb_config, "methods_number": methods_number, "outputs_number": outputs_number}) print(f'num of samples in IIIT daaset: {len(filtered_index_list)}') ``` 3. Request the natural perturbation serving server and save the data returned by the serving server. ```python ip = '0.0.0.0:8888' client = Client(ip, "perturbation", "natural_perturbation") start_time = time.time() result = client.infer(instances) end_time = time.time() print('generated natural perturbs images cost: ', end_time - start_time) env_save = lmdb.open(lmdb_save_path, map_size=1099511627776) txn = env.begin(write=False) with env_save.begin(write=True) as txn_save: new_index = 1 for i, index in enumerate(filtered_index_list): try: file_names = result[i]['file_names'].split(';') except: print('index: ', index) print(result[i]) length = result[i]['file_length'].tolist() before = 0 label = filtered_labels[i] label = label.encode() img_key = 'image-%09d'.encode() % index ori_img = txn.get(img_key) names_dict = result[i]['names_dict'] names_dict = json.loads(names_dict) for name, leng in zip(file_names, length): label_key = 'label-%09d'.encode() % new_index txn_save.put(label_key, label) img_key = 'image-%09d'.encode() % new_index adv_img = result[i]['results'] adv_img = adv_img[before:before + leng] adv_img_key = 'adv_image-%09d'.encode() % new_index txn_save.put(img_key, ori_img) txn_save.put(adv_img_key, adv_img) adv_info_key = 'adv_info-%09d'.encode() % new_index adv_info = json.dumps(names_dict[name]).encode() txn_save.put(adv_info_key, adv_info) before = before + leng new_index += 1 xn_save.put("num-samples".encode(),str(new_index - 1).encode()) env.close() ``` 3. Run the script for generating natural perturbation samples. ```bash python generate_adv_samples.py ``` 4. The generated natural perturbation data is in .lmdb format and contains the following data items in key-value pair: - label- %09d: actual image label - image- %09d: original image data - adv_image- %09d: perturbed image data - adv_info- %09d: perturbation information, including perturbation methods and parameters - num-samples: number of samples in the LMDB dataset ### CNN-CTC Model Inference on the Generated Perturbation Dataset 1. In `default_config.yaml`, set the test dataset path `TEST_DATASET_PATH` to be the same as the path of the perturbation sample dataset `ADV_TEST_DATASET_PATH`. Example: ```yaml TEST_DATASET_PATH: "/home/mindarmour/examples/natural_robustness/ocr_evaluate/data" ADV_TEST_DATASET_PATH: "/home/mindarmour/examples/natural_robustness/ocr_evaluate/data" ``` 2. The core scripts are described as follows: 1. Load the model and the dataset. ```python ds = test_dataset_creator(is_adv=config.IS_ADV) net = CNNCTC(config.NUM_CLASS, config.HIDDEN_SIZE, config.FINAL_FEATURE_WIDTH) ckpt_path = config.CHECKPOINT_PATH param_dict = load_checkpoint(ckpt_path) load_param_into_net(net, param_dict) print('parameters loaded! from: ', ckpt_path) ``` 2. Perform inference and save the inference results of the model on the original sample and perturbation sample. ```python env_save = lmdb.open(lmdb_save_path, map_size=1099511627776) with env_save.begin(write=True) as txn_save: for data in ds.create_tuple_iterator(): img, _, text, _, length = data img_tensor = Tensor(img, mstype.float32) model_predict = net(img_tensor) model_predict = np.squeeze(model_predict.asnumpy()) preds_size = np.array([model_predict.shape[1]] * config.TEST_BATCH_SIZE) preds_index = np.argmax(model_predict, 2) preds_index = np.reshape(preds_index, [-1]) preds_str = converter.decode(preds_index, preds_size) label_str = converter.reverse_encode(text.asnumpy(), length.asnumpy()) print("Prediction samples: \n", preds_str[:5]) print("Ground truth: \n", label_str[:5]) for pred, label in zip(preds_str, label_str): if pred == label: correct_count += 1 count += 1 if config.IS_ADV: pred_key = 'adv_pred-%09d'.encode() % count else: pred_key = 'pred-%09d'.encode() % count txn_save.put(pred_key, pred.encode()) accuracy = correct_count / count ``` 3. Run the eval_and_save.py script. ```bash python eval_and_save.py ``` The CNN-CTC model performs inference on the generated natural perturbation dataset and saves the inference result of the model on each sample to the path `ADV_TEST_DATASET_PATH`. The following data items in key-value pair are added to the dataset: - pred- %09d: prediction result of the model on the original image data - adv_pred- %09d: prediction result of the model on the perturbed image data Prediction result of the model on real samples: ```bash Prediction samples: ['private', 'private', 'parking', 'parking', 'salutes'] Ground truth: ['private', 'private', 'parking', 'parking', 'salutes'] Prediction samples: ['venus', 'venus', 'its', 'its', 'the'] Ground truth: ['venus', 'venus', 'its', 'its', 'the'] Prediction samples: ['summer', 'summer', 'joeys', 'joeys', 'think'] Ground truth: ['summer', 'summer', 'joes', 'joes', 'think'] ... ``` Prediction result of the model on natural perturbation samples: ```bash Prediction samples: ['private', 'private', 'parking', 'parking', 'salutes'] Ground truth: ['private', 'private', 'parking', 'parking', 'salutes'] Prediction samples: ['dams', 'vares', 'its', 'its', 'the'] Ground truth: ['venus', 'venus', 'its', 'its', 'the'] Prediction samples: ['sune', 'summer', '', 'joeys', 'think'] Ground truth: ['summer', 'summer', 'joes', 'joes', 'think'] ... ``` Accuracies of the model on the original test dataset and natural perturbation dataset: ```bash num of samples in IIIT dataset: 5952 Accuracy of benign sample: 0.8546195652173914 Accuracy of perturbed sample: 0.6126019021739131 ``` ### Robustness Analysis Run the analyse.py script to perform statistical analysis on the performance of CNN-CTC model on the perturbation dataset. ```bash python analyse.py ``` Analysis result: ```bash Number of samples in analyse dataset: 5952 Accuracy of original dataset: 0.46127717391304346 Accuracy of adversarial dataset: 0.6126019021739131 Number of samples correctly predicted in original dataset but wrong in adversarial dataset: 832 Number of samples both wrong predicted in original and adversarial dataset: 1449 ------------------------------------------------------------------------------ Method Shear Number of perturb samples: 442 Number of wrong predicted: 351 Number of correctly predicted in origin dataset but wrong in adversarial: 153 Number of both wrong predicted in origin and adversarial dataset: 198 ------------------------------------------------------------------------------ Method Contrast Number of perturb samples: 387 Number of wrong predicted: 57 Number of correctly predicted in origin dataset but wrong in adversarial: 8 Number of both wrong predicted in origin and adversarial dataset: 49 ------------------------------------------------------------------------------ Method GaussianBlur Number of perturb samples: 436 Number of wrong predicted: 181 Number of correctly predicted in origin dataset but wrong in adversarial: 71 Number of both wrong predicted in origin and adversarial dataset: 110 ------------------------------------------------------------------------------ Method MotionBlur Number of perturb samples: 458 Number of wrong predicted: 215 Number of correctly predicted in origin dataset but wrong in adversarial: 92 Number of both wrong predicted in origin and adversarial dataset: 123 ------------------------------------------------------------------------------ Method GradientLuminance Number of perturb samples: 1243 Number of wrong predicted: 154 Number of correctly predicted in origin dataset but wrong in adversarial: 4 Number of both wrong predicted in origin and adversarial dataset: 150 ------------------------------------------------------------------------------ Method Rotate Number of perturb samples: 405 Number of wrong predicted: 298 Number of correctly predicted in origin dataset but wrong in adversarial: 136 Number of both wrong predicted in origin and adversarial dataset: 162 ------------------------------------------------------------------------------ Method SaltAndPepperNoise Number of perturb samples: 413 Number of wrong predicted: 116 Number of correctly predicted in origin dataset but wrong in adversarial: 29 Number of both wrong predicted in origin and adversarial dataset: 87 ------------------------------------------------------------------------------ Method Translate Number of perturb samples: 419 Number of wrong predicted: 159 Number of correctly predicted in origin dataset but wrong in adversarial: 57 Number of both wrong predicted in origin and adversarial dataset: 102 ------------------------------------------------------------------------------ Method GradientBlur Number of perturb samples: 440 Number of wrong predicted: 92 Number of correctly predicted in origin dataset but wrong in adversarial: 26 Number of both wrong predicted in origin and adversarial dataset: 66 ------------------------------------------------------------------------------ Method Perspective Number of perturb samples: 401 Number of wrong predicted: 181 Number of correctly predicted in origin dataset but wrong in adversarial: 75 Number of both wrong predicted in origin and adversarial dataset: 106 ------------------------------------------------------------------------------ Method Curve Number of perturb samples: 410 Number of wrong predicted: 361 Number of correctly predicted in origin dataset but wrong in adversarial: 162 Number of both wrong predicted in origin and adversarial dataset: 199 ------------------------------------------------------------------------------ Method Scale Number of perturb samples: 434 Number of wrong predicted: 116 Number of correctly predicted in origin dataset but wrong in adversarial: 19 Number of both wrong predicted in origin and adversarial dataset: 97 ------------------------------------------------------------------------------ ``` The analysis result includes: 1. Number of evaluated samples: 5888 2. Accuracy of the CNN-CTC model on the original dataset: 85.4% 3. Accuracy of the CNN-CTC model on the perturbation dataset: 57.2% 4. Number of samples with correct prediction on the original image but incorrect prediction on the perturbed image: 1736 5. Number of samples with incorrect prediction on both the original image and the perturbed image: 782 6. For each perturbation method, the following are included: the number of samples, the number of perturbation samples with incorrect prediction, the number of samples with correct prediction on the original image but incorrect prediction on the perturbed image, and the number of samples with incorrect prediction on both the original image and the perturbed image. If the prediction error rate of the CNN-CTC model is high after a perturbation method is used, the CNN-CTC model is not robust to this method. You are advised to improve the robustness of the CNN-CTC model. For example, if most images after perturbation (such as Rotate, Curve, MotionBlur, and Shear) are incorrectly predicted, further analysis is recommended. The following folders are generated in `ADV_TEST_DATASET_PATH`: ```bash adv_wrong_pred # Dataset whose image classification is incorrect after perturbation. ori_corret_adv_wrong_pred # Dataset with correct classification on the original image but incorrect classification on the perturbed image in the model ori_wrong_adv_wrong_pred # Dataset with incorrect classification on both the original image and the perturbed image in the model ``` Each folder is classified according to the perturbation method. ![1646730529400](images/catalog.png) Each image is named in the format of true value-predicted value.png, as shown in the following figure. ![1646812837049](images/name_format.png) The stored images can be further analyzed to determine whether the model quality problem, image quality problem, or perturbation method affects the image semantics and causes prediction errors. ![1646812837049](images/result_demo.png)