TensorFlow模型转换MindSpore模型的方法

本章将以LeNet5网络结构为例，结合代码来详细介绍模型权重转换方法。

首先我们需要明确训练好的TensorFlow模型转换成MindSpore能够使用的checkpoint，基本需要以下几个流程：

打印TensorFlow的参数文件里面所有参数的参数名和shape，打印需要加载参数的MindSpore Cell里所有参数的参数名和shape；
比较参数名和shape，构造参数映射关系；
按照参数映射将TensorFlow的参数映射到MindSpore的Parameter，构成Parameter List之后保存成checkpoint；
单元测试：MindSpore加载转换后的参数，固定输入，对比MindSpore与TensorFlow的结果。

本章实例基于TensorFlow 1.15版本实现。

打印参数信息

# 通过TensorFlow参数文件读取模型参数的name和对应参数的shape
def tensorflow_param(ckpt_path):
    """Get TensorFlow parameter and shape"""
    tf_params = {}
    reader = tf.train.load_checkpoint(ckpt_path)
    for name in reader.get_variable_to_shape_map():
        try:
            print(name, reader.get_tensor(name).shape)
            tf_params[name] = reader.get_tensor(name)
        except Exception as e:
            pass
    return tf_params

# 通过MindSpore的Cell，打印Cell里所有参数的参数名和shape，返回参数字典
def mindspore_params(network):
    """Get MindSpore parameter and shape"""
    ms_params = {}
    for param in network.get_parameters():
        name = param.name
        value = param.data.asnumpy()
        print(name, value.shape)
        ms_params[name] = value
    return ms_params

执行以下代码：

from ms_lenet import LeNet5
tf_ckpt_path = './checkpoint_dir'
tensorflow_param(tf_ckpt_path)
print("*"*20)
network = LeNet5()
mindspore_params(network)

输出如下：

fc3/dense/kernel (84, 1)
fc3/dense/bias (1,)
conv1/weight (5, 5, 1, 6)
fc1/dense/bias (120,)
fc1/dense/kernel (400, 120)
fc2/dense/bias (84,)
conv2/weight (5, 5, 6, 16)
fc2/dense/kernel (120, 84)
******************************
conv1.weight (6, 1, 5, 5)
conv2.weight (16, 6, 5, 5)
fc1.weight (120, 400)
fc1.bias (120,)
fc2.weight (84, 120)
fc2.bias (84,)
fc3.weight (1, 84)
fc3.bias (1,)

参数映射及checkpoint保存

通过以上参数名和shape输出进行对比，可以发现两者参数名有一定规律性可以结合网络结构进行匹配，针对参数shape可以发现卷积和全连接层的shape维度不一样， MindSpore的卷积层中weight的shape为[out_channel, in_channel, kernel_height, kernel_weight]，而TensorFlow卷积层的weight 的shape为[kernel_height, kernel_weight, in_channel, out_channel]，MindSpore的全连接层中weight的shape为[out_channel, in_channel]，而TensorFlow全连接层的weight的shape为[in_channel, out_channel]，所以在这里我们处理卷积和全连接层权重转换的时候需要做下转置。

def tensorflow2mindspore(tf_ckpt_dir, param_mapping_dict, ms_ckpt_path):

    reader = tf.train.load_checkpoint(tf_ckpt_dir)
    new_params_list = []
    for name in param_mapping_dict:
        param_dict = {}
        parameter = reader.get_tensor(name)
        if 'conv' in name and 'weight' in name:
            # 对卷积权重进行转置
            parameter = np.transpose(parameter, axes=[3, 2, 0, 1])
        if 'fc' in name and 'kernel' in name:
            parameter = np.transpose(parameter, axes=[1, 0])
        param_dict['name'] = param_mapping_dict[name]
        param_dict['data'] = Tensor(parameter)
        new_params_list.append(param_dict)
    save_checkpoint(new_params_list, os.path.join(ms_ckpt_path, 'tf2mindspore.ckpt'))

因为当前网络的参数名映射非常复杂，通过参数名很难找到映射关系，所以我们需要通过一个参数映射字典。当遇到比较简单的参数名映射时，转换方法可以参考PyTorch模型文件转MindSpore模型文件的方法。

params_mapping = {
    "conv1/weight":"conv1.weight",
    "conv2/weight":"conv2.weight",
    "fc1/dense/kernel":"fc1.weight",
    "fc1/dense/bias":"fc1.bias",
    "fc2/dense/kernel":"fc2.weight",
    "fc2/dense/bias":"fc2.bias",
    "fc3/dense/kernel":"fc3.weight",
    "fc3/dense/bias":"fc3.bias",
}
ms_ckpt_path='./model'
tf_ckpt_dir = './model'
tensorflow2mindspore(tf_ckpt_dir, param_mapping_dir, ms_ckpt_path)

执行完成后可以在相应路径下找到转换后MindSpore可以使用的模型文件。

单元测试

获得对应的参数文件后，我们需要对整个模型做一次单元测试，保证模型的一致性：

from ms_lenet import mindspore_running
from tf_lenet import tf_running

tf_model_path = './model'
tf_outputs = tf_running(tf_model_path)
ms_outputs = mindspore_running('./tf2mindspore.ckpt')
diff = mean_relative_error(tf_outputs, ms_outputs)
print("************tensorflow outputs**************")
print(tf_outputs)
print("************mindspore outputs**************")
print(ms_outputs)
print("Diff: ", diff)

输出

************tensorflow outputs**************
[[56.040612]
 [56.040612]
 [56.040612]
 [56.040612]
 [56.040612]
 [56.040612]
 [56.04064 ]
 [56.04064 ]]
************mindspore outputs**************
[[56.04065]
 [56.04065]
 [56.04065]
 [56.04065]
 [56.04065]
 [56.04065]
 [56.04065]
 [56.04065]]
Diff:  5.4456143e-07

可以看到最后的结果相差不大，基本符合预期。