Using the BERT Network to Implement Intelligent Poem Writing

Linux Ascend Model Training Inference Application Expert

Poetry is an indispensable part of the five-millennium-old Chinese culture. When appreciating poetry, you can perceive the pure and vast world with ultimate sensibility and reduce stress and anxiety brought by the fast-paced world. As we know, one has to practice a skill a lot to become good at it. Today, let’s see how the science-backed MindSpore trains a model to show its sense of arts!

Case Overview

Use MindSpore to train an intelligent poem writing model and deploy the prediction service. The following flowchart shows the process:

introduce image

Figure 1: Case flowchart

The following skips the process of pre-training BERT and directly describes the process of fine-tuning a pre-trained BERT-base model of MindSpore.

In addition, the following shows how to deploy the model as a prediction service through MindSpore Serving. The client code can send a request to the prediction service and obtain the prediction result.

Model Description

NLP-related networks are required to deal with poems. BERT, as a milestone model in the NLP domain, greatly promotes the development of the NLP community. The BERT model is proposed by Google and uses the Encoder structure in Transformer. It stacks multiple layers of Encoders and uses the attention mechanism to achieve the state of the art (SOTA) effect in multiple general language understanding evaluation (GLUE) tasks.

This attention mechanism is different from the RNN structure and can be used for high-level parallel computing. In this way, the computing power of the Ascend 910 AI Processor can be fully utilized to achieve optimal performance.

Model Training

There are two steps: pre-training and fine-tuning. Pre-training is first performed on a large amount of unlabeled data. It is expected that the model can master a common human language semantic mechanism through this process. Then, in the fine-tuning phase, training is performed on labeled data in a specific segmented domain to complete a specific task.

Pre-training

Pre-training is self-coding training performed on unlabeled data. Therefore, the design of training tasks is especially important. Pre-training in BERT includes two tasks: masked language model (MLM) and next sentence prediction (NSP).

The MLM task randomly replaces some tokens with the [MASK] labels during input, and then predicts the original tokens based on the context through the attention mechanism.
The input of a BERT model is two sentences: A and B. When data is built, positions of A and B are randomly exchanged at a 50% probability. The NSP task is used to predict whether A and B are originally connected.

Since the MLM task does not exist in the actual task, a pre-training NSP task that better matches the actual task type is added based on the MLM task.

In the preceding description, the pre-training process does not require a task data label. Such an MLM training task is essentially a denoising self-coding model. Therefore, BERT may perform pre-training by using massive unlabeled data. Through tasks set in the pre-training stage, BERT can learn basic semantic logic from unlabeled data and then complete specific task training in cooperation with the fine-tuning process.

The following figure shows the BERT model structure. If you enter two sentences in a Chinese model, each token corresponds to a Chinese character. [CLS] and [SEP] are inserted special tokens.

Teaser image

Figure 2: BERT model structure [1]

Fine-tuning

Fine-tuning is used to add a layer of adaptation task to the end of the pre-trained BERT model and then perform a small amount of training on labeled data.

Fine-tuning modes are classified into two types: end-to-end fine-tuning and feature-based approach. The difference between the two modes lies in whether to modify parameters in the pre-trained BERT model at the fine-tuning stage. In most cases, end-to-end fine-tuning is used.

Modifying a Model

BERT uses the Encoder structure. attention_mask is an all-ones vector. That is, each token can view tokens before and after it. This helps each token learn the entire sentence information and enhance the semantic understanding capability, therefore, BERT is not a generative model.

In the statement generation task, when the next token is generated, only the information about the previous token can be viewed. You need to change attention_mask to the lower triangular matrix so that the current token can view only the information about itself and the previous token.

The data used for fine-tuning is more than 40,000 poems without labels. The output of each token must be close to the output of the next labeled token, and the cross entropy is used as the loss function.

Teaser image

Figure 3 Training process

Sample Code

Download the sample code and run the sample code to view the poem writing effect. The code structure is as follows:

└─bert_poetry
  ├── src
    ├── bert_for_pre_training.py           # Encapsulating BERT-base forward and backward network class
    ├── bert_model.py                      # Defining the BERT forward network structure
    ├── finetune_config.py                 # Fine-tuning configuration file
    ├── fused_layer_norm.py                # Defining fused_layer_norm
    ├── __init__.py                        # __init__
    ├── utils.py                           # Defining the fine-tuning forward network structure
    ├── poetry_utils.py                    # Tokenizer
    └── poetry_dataset.py                  # Parsing poetry.txt and generating the required dataset
  ├── vocab.txt                            # Vocabulary
  ├── generator.py                         # Function used for generating poems during inference
  ├── poetry.py                            # Training, inference, and export functions
  ├── serving
    ├── ms_serving                         # Enabling MindSpore Serving
    ├── bert_flask.py                      # Receiving requests on a server.
    ├── poetry_client.py                   # Client code
    ├── ms_service_pb2_grpc.py             # Defining grpc-related functions for bert_flask.py
    └── ms_service_pb2.py                  # Defining protocol buffer-related functions for bert_flask.py

Implementation Procedure

Basic Information

Perform training and inference on the Ascend 910 AI Processor using MindSpore 0.7.0-beta.

Data Preparation

A dataset containing 43030 poems: poetry.txt.

Pre-trained checkpoints of a BERT-base model: Download from MindSpore.

Training

Modify the pre_training_ckpt path in src/finetune_config.py, load pre-trained checkpoints, set batch_size to bs, and set dataset_path to the path for storing poems. BertConfig is set to the base model by default.

'dataset_path': '/your/path/to/poetry.txt',
'batch_size': bs,
'pre_training_ckpt': '/your/path/to/pre_training_ckpt',

Run the training command.

python poetry.py

Inference Validation

Modify the test_eval function in poetry.py to randomly generate a poem, continue to complete a poem, or generate an acrostic poem.

The generate_random_poetry function is used to randomly generate and continue to complete a poem. If the input parameter s is empty, a poem is randomly generated. If the input parameter s is not empty, the poem writing continues based on the input value.

    output = generate_random_poetry(poetrymodel, s='')         #随机生成
    output = generate_random_poetry(poetrymodel, s='天下为公')  #续写诗句

The generate_hidden function is used to generate an acrostic poem. The value of the input parameter head is the first word in each line of a poem.

    output = generate_hidden(poetrymodel, head="人工智能")  #藏头诗

Run the inference command.

python poetry.py --train=False  --ckpt_path=/your/ckpt/path

By default, a randomly generated poem, a poem completed based on the input value, and an acrostic poem are generated in the script. The output poems are as follows:

A randomly generated poem:

大堤柳暗，
春深树根。
东望一望，
断回还家。
山色渐风雨，
东风多雨禾。
无情与去，
万里所思。

A poem completed based on the input value:

天下为公少，
唯君北向西。
远山无路见，
长水见人偏。
一路巴猿啸，
千峰楚客啼。
幽深有诗策，
无以话年华。

An acrostic poem:

人君离别难堪望，
工部张机自少年。
智士不知身没处，
能令圣德属何年。

Service Deployment

Use MindSpore Serving to deploy the trained model as an inference service. Server-side deployment includes the following steps: model export, Serving startup, and startup for preprocessing and post-processing services. A client sends an inference request to a server for model inference. The server returns the generated poem to the client for display.

Model export

Before using Serving to deploy a service, export the MindIR model using the export_net function provided in poetry.py.
```
python poetry.py --export=True --ckpt_path=/your/ckpt/path
```
The poetry.pb file is generated in the current path.

Serving startup

Start Serving on the server and load the exported MindIR file poetry.pb.

cd serving
./ms_serving --model_path=/path/to/your/MINDIR_file --model_name=your_mindir.pb

Startup for preprocessing and post-processing services

Implement the preprocessing and post-processing services using the Flask framework. Run the bert_flask.py file on the server to start the Flask service.
```
python bert_flask.py
```
After the preceding steps are performed, the server-side deployment is complete.

Client

Use a computer as the client. Set the URL request address in poetry_client.py to the IP address of the server where the inference service is started, and ensure that the port number is the same as that in bert_flask.py on the server. For example:

url = 'http://10.*.*.*:8080/'

Run the poetry_client.py file.

python poetry_client.py

Enter an instruction on the client to perform inference on the remote server to obtain a poem.

选择模式：0-随机生成，1：续写，2：藏头诗
0

一朵黄花叶，
千竿绿树枝。
含香待夏晚，
澹浩长风时。

选择模式：0-随机生成，1：续写，2：藏头诗
1
输入首句诗
明月

明月照三峡，
长空一片云。
秋风与雨过，
唯有客舟分。
寒影出何处，
远林含不闻。
不知前后事，
何道逐风君。

选择模式：0-随机生成，1：续写，2：藏头诗
2
输入藏头诗
人工智能

人生事太远，
工部与神期。
智者岂无识，
能文争有疑。

Read the poem and appreciate its tonal patterns, rhymes, and meanings. An AI poet has established fame.

You can also modify other datasets to complete simple generation tasks, such as the Chinese New Year couplet writing and simple chat robot.

References

[1] BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding

[2] https://github.com/AaronJny/DeepLearningExamples/

[3] https://github.com/bojone/bert4keras