# 网络迁移调试实例 [](https://gitee.com/mindspore/docs/blob/r2.3.0/docs/mindspore/source_zh_cn/migration_guide/sample_code.md) 本章将以经典网络 ResNet50 为例,结合代码来详细介绍网络迁移方法。 ## 模型分析与准备 假设已经按照[环境准备](https://www.mindspore.cn/docs/zh-CN/r2.3.0/migration_guide/enveriment_preparation.html)章节配置好了MindSpore的运行环境。且假设resnet50在models仓还没有实现。 首先需要分析算法及网络结构。 残差神经网络(ResNet)由微软研究院何凯明等人提出,通过ResNet单元,成功训练152层神经网络,赢得了ILSVRC2015冠军。传统的卷积网络或全连接网络或多或少存在信息丢失的问题,还会造成梯度消失或爆炸,导致深度网络训练失败,ResNet则在一定程度上解决了这个问题。通过将输入信息传递给输出,确保信息完整性。整个网络只需要学习输入和输出的差异部分,简化了学习目标和难度。ResNet的结构大幅提高了神经网络训练的速度,并且大大提高了模型的准确率。 [论文](https://arxiv.org/pdf/1512.03385.pdf):Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun."Deep Residual Learning for Image Recognition" 我们找到了一份[PyTorch ResNet50 Cifar10的示例代码](https://gitee.com/mindspore/docs/tree/r2.3.0/docs/mindspore/source_zh_cn/migration_guide/code/resnet_convert/resnet_pytorch),里面包含了PyTorch ResNet的实现,Cifar10数据处理,网络训练及推理流程。 ### checklist 在阅读论文和参考实现过程中,我们分析填写以下checklist: |trick|记录| |----|----| |数据增强| RandomCrop,RandomHorizontalFlip,Resize,Normalize | |学习率衰减策略| 固定学习率 0.001 | |优化器参数| Adam优化器,weight_decay=1e-5 | |训练参数| batch_size=32,epochs=90 | |网络结构优化点| Bottleneck | |训练流程优化点| 无 | ### 复现参考实现 下载PyTorch的代码,CIFAR-10的数据集,对网络进行训练: ```text Train Epoch: 89 [0/1563 (0%)] Loss: 0.010917 Train Epoch: 89 [100/1563 (6%)] Loss: 0.013386 Train Epoch: 89 [200/1563 (13%)] Loss: 0.078772 Train Epoch: 89 [300/1563 (19%)] Loss: 0.031228 Train Epoch: 89 [400/1563 (26%)] Loss: 0.073462 Train Epoch: 89 [500/1563 (32%)] Loss: 0.098645 Train Epoch: 89 [600/1563 (38%)] Loss: 0.112967 Train Epoch: 89 [700/1563 (45%)] Loss: 0.137923 Train Epoch: 89 [800/1563 (51%)] Loss: 0.143274 Train Epoch: 89 [900/1563 (58%)] Loss: 0.088426 Train Epoch: 89 [1000/1563 (64%)] Loss: 0.071185 Train Epoch: 89 [1100/1563 (70%)] Loss: 0.094342 Train Epoch: 89 [1200/1563 (77%)] Loss: 0.126669 Train Epoch: 89 [1300/1563 (83%)] Loss: 0.245604 Train Epoch: 89 [1400/1563 (90%)] Loss: 0.050761 Train Epoch: 89 [1500/1563 (96%)] Loss: 0.080932 Test set: Average loss: -9.7052, Accuracy: 91% Finished Training ``` 可以从[resnet_pytorch_res](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/models/resnet_pytorch_res.zip)下载到训练时日志和保存的参数文件。 ### 分析API/特性缺失 - API分析 | PyTorch 使用API | MindSpore 对应API | 是否有差异 | | ---------------------- | ------------------ | ------| | `nn.Conv2D` | `nn.Conv2d` | 有,[差异对比](https://www.mindspore.cn/docs/zh-CN/r2.3.0/note/api_mapping/pytorch_diff/Conv2d.html) | | `nn.BatchNorm2D` | `nn.BatchNom2d` | 有,[差异对比](https://www.mindspore.cn/docs/zh-CN/r2.3.0/note/api_mapping/pytorch_diff/BatchNorm2d.html) | | `nn.ReLU` | `nn.ReLU` | 无 | | `nn.MaxPool2D` | `nn.MaxPool2d` | 有,[差异对比](https://www.mindspore.cn/docs/zh-CN/r2.3.0/note/api_mapping/pytorch_diff/MaxPool2d.html) | | `nn.AdaptiveAvgPool2D` | `nn.AdaptiveAvgPool2D` | 无 | | `nn.Linear` | `nn.Dense` | 有,[差异对比](https://www.mindspore.cn/docs/zh-CN/r2.3.0/note/api_mapping/pytorch_diff/Dense.html) | | `torch.flatten` | `nn.Flatten` | 无 | 可通过借助[MindSpore Dev Toolkit](https://www.mindspore.cn/docs/zh-CN/r2.3.0/migration_guide/migrator_with_tools.html#%E7%BD%91%E7%BB%9C%E8%BF%81%E7%A7%BB%E5%BC%80%E5%8F%91)API扫描工具,或查看[PyTorch API映射](https://www.mindspore.cn/docs/zh-CN/r2.3.0/note/api_mapping/pytorch_api_mapping.html)来获取API差异。 - 功能分析 | Pytorch 使用功能 | MindSpore 对应功能 | | ------------------------- | ------------------------------------- | | `nn.init.kaiming_normal_` | `initializer(init='HeNormal')` | | `nn.init.constant_` | `initializer(init='Constant')` | | `nn.Sequential` | `nn.SequentialCell` | | `nn.Module` | `nn.Cell` | | `nn.distibuted` | `set_auto_parallel_context` | | `torch.optim.SGD` | `nn.optim.SGD` or `nn.optim.Momentum` | (由于MindSpore 和 PyTorch 在接口设计上不完全一致,这里仅列出关键功能的比对) 经过API和功能分析,我们发现,相比 PyTorch,MindSpore 上没有缺失的API和功能。 ## MindSpore模型实现 ### 数据集 以CIFAR-10数据集为例,其目录组织参考: ```text └─dataset_path ├─cifar-10-batches-bin # train dataset ├─ data_batch_1.bin ├─ data_batch_2.bin ├─ data_batch_3.bin ├─ data_batch_4.bin ├─ data_batch_5.bin └─cifar-10-verify-bin # evaluate dataset ├─ test_batch.bin ``` PyTorch和MindSpore的数据集处理代码如下:
| PyTorch 数据集处理 | MindSpore 数据集处理 |
```python
import torch
import torchvision.transforms as trans
import torchvision
train_transform = trans.Compose([
trans.RandomCrop(32, padding=4),
trans.RandomHorizontalFlip(0.5),
trans.Resize(224),
trans.ToTensor(),
trans.Normalize([0.4914, 0.4822, 0.4465],
[0.2023, 0.1994, 0.2010]),
])
test_transform = trans.Compose([
trans.Resize(224),
trans.RandomHorizontalFlip(0.5),
trans.ToTensor(),
trans.Normalize([0.4914, 0.4822, 0.4465],
[0.2023, 0.1994, 0.2010]),
])
# 若需要可在datasets.CIFAR10接口中设置download=True自动下载
train_set = torchvision.datasets.CIFAR10(root='./data',
train=True,
transform=train_transform)
train_loader = torch.utils.data.DataLoader(train_set,
batch_size=32,
shuffle=True)
test_set = torchvision.datasets.CIFAR10(root='./data',
train=False,
transform=test_transform)
test_loader = torch.utils.data.DataLoader(test_set,
batch_size=1,
shuffle=False)
```
|
```python
import mindspore as ms
import mindspore.dataset as ds
from mindspore.dataset import vision
from mindspore.dataset.transforms import TypeCast
def create_cifar_dataset(dataset_path, do_train, batch_size=32,
image_size=(224, 224),
rank_size=1, rank_id=0):
dataset = ds.Cifar10Dataset(dataset_path,
shuffle=do_train,
num_shards=rank_size,
shard_id=rank_id)
# define map operations
trans = []
if do_train:
trans += [
vision.RandomCrop((32, 32), (4, 4, 4, 4)),
vision.RandomHorizontalFlip(prob=0.5)
]
trans += [
vision.Resize(image_size),
vision.Rescale(1.0 / 255.0, 0.0),
vision.Normalize([0.4914, 0.4822, 0.4465],
[0.2023, 0.1994, 0.2010]),
vision.HWC2CHW()
]
type_cast_op = TypeCast(ms.int32)
data_set = dataset.map(operations=type_cast_op,
input_columns="label")
data_set = data_set.map(operations=trans,
input_columns="image")
# apply batch operations
data_set = data_set.batch(batch_size,
drop_remainder=do_train)
return data_set
```
|
| PyTorch | MindSpore |
```python
nn.Conv2d(
in_planes,
out_planes,
kernel_size=3,
stride=stride,
padding=dilation,
groups=groups,
bias=False,
dilation=dilation,
)
```
|
```python
nn.Conv2d(
in_planes,
out_planes,
kernel_size=3,
pad_mode="pad",
stride=stride,
padding=dilation,
group=groups,
has_bias=False,
dilation=dilation,
)
```
|
```python nn.Module ``` |
```python nn.Cell ``` |
```python nn.ReLU(inplace=True) ``` |
```python nn.ReLU() ``` |
```python # PyTorch 图构造 forward ``` |
```python # MindSpore 图构造 construct ``` |
```python
# PyTorch 带padding的MaxPool2d
maxpool = nn.MaxPool2d(kernel_size=3,
stride=2,
padding=1)
```
|
```python
# MindSpore 带padding的MaxPool2d
maxpool = nn.SequentialCell([
nn.Pad(paddings=((0, 0), (0, 0), (1, 1), (1, 1)),
mode="CONSTANT"),
nn.MaxPool2d(kernel_size=3, stride=2)])
```
|
```python # PyTorch AdaptiveAvgPool2d avgpool = nn.AdaptiveAvgPool2d((1, 1)) ``` |
```python # MindSpore ReduceMean与左侧输出shape为1时功能一致,且速度更快 mean = ops.ReduceMean(keep_dims=True) ``` |
```python # PyTorch 全连接 fc = nn.Linear(512 * block.expansion, num_classes) ``` |
```python # MindSpore 全连接 fc = nn.Dense(512 * block.expansion, num_classes) ``` |
```python # PyTorch Sequential nn.Sequential ``` |
```python # MindSpore SequentialCell nn.SequentialCell ``` |
```python
# PyTorch 初始化
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(
m.weight,
mode="fan_out",
nonlinearity="relu")
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.constant_(
m.weight,
1)
nn.init.constant_(
m.bias,
0)
# Zero-initialize the last BN in each residual branch,
# so that the residual branch starts with zeros,
# and each residual block behaves like an identity.
# This improves the model by 0.2~0.3%.
# Reference: https://arxiv.org/abs/1706.02677
if zero_init_residual:
for m in self.modules():
is_bottleneck = isinstance(m, Bottleneck)
is_basicblock = isinstance(m, BasicBlock)
if is_bottleneck and m.bn3.weight is not None:
# type: ignore[arg-type]
nn.init.constant_(m.bn3.weight, 0)
elif is_basicblock and m.bn2.weight is not None:
# type: ignore[arg-type]
nn.init.constant_(m.bn2.weight, 0)
```
|
```python
# MindSpore 初始化
from mindspore import common.initializer
for _, cell in self.cells_and_names():
if isinstance(cell, nn.Conv2d):
cell.weight.set_data(initializer.initializer(
initializer.HeNormal(negative_slope=0, mode='fan_out',
nonlinearity='relu'),
cell.weight.shape, cell.weight.dtype))
elif isinstance(cell, (nn.BatchNorm2d, nn.GroupNorm)):
cell.gamma.set_data(
initializer.initializer("ones", cell.gamma.shape,
cell.gamma.dtype))
cell.beta.set_data(
initializer.initializer("zeros", cell.beta.shape,
cell.beta.dtype))
elif isinstance(cell, (nn.Dense)):
cell.weight.set_data(initializer.initializer(
initializer.HeUniform(negative_slope=math.sqrt(5)),
cell.weight.shape, cell.weight.dtype))
cell.bias.set_data(
initializer.initializer("zeros", cell.bias.shape,
cell.bias.dtype))
if zero_init_residual:
for _, cell in self.cells_and_names():
is_bottleneck = isinstance(cell, Bottleneck)
is_basicblock = isinstance(cell, BasicBlock)
if is_bottleneck and cell.bn3.gamma is not None:
cell.bn3.gamma.set_data("zeros", cell.bn3.gamma.shape,
cell.bn3.gamma.dtype)
elif is_basicblock and cell.bn2.weight is not None:
cell.bn2.gamma.set_data("zeros", cell.bn2.gamma.shape,
cell.bn2.gamma.dtype)
```
|
| PyTorch | MindSpore |
```python net_loss = torch.nn.CrossEntropyLoss() ``` |
```python loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean') ``` |
| PyTorch | MindSpore |
```python
net_opt = torch.optim.Adam(net.parameters(),
0.001,
weight_decay=1e-5)
```
|
```python
optimizer = ms.nn.Adam(resnet.trainable_params(),
0.001,
weight_decay=1e-5)
```
|
| PyTorch | MindSpore |
```python
# 通过PyTorch参数文件,打印PyTorch所有参数名和shape,返回字典
def pytorch_params(pth_file):
par_dict = torch.load(pth_file, map_location='cpu')
pt_params = {}
for name in par_dict:
parameter = par_dict[name]
print(name, parameter.numpy().shape)
pt_params[name] = parameter.numpy()
return pt_params
pth_path = "resnet.pth"
pt_param = pytorch_params(pth_path)
print("="*20)
```
得到结果:
```text
conv1.weight (64, 3, 7, 7)
bn1.weight (64,)
bn1.bias (64,)
bn1.running_mean (64,)
bn1.running_var (64,)
bn1.num_batches_tracked ()
layer1.0.conv1.weight (64, 64, 1, 1)
```
|
```python
# 通过MindSpore的Cell,打印Cell里所有参数名和shape,返回字典
def mindspore_params(network):
ms_params = {}
for param in network.get_parameters():
name = param.name
value = param.data.asnumpy()
print(name, value.shape)
ms_params[name] = value
return ms_params
from resnet_ms.src.resnet import resnet50 as ms_resnet50
ms_param = mindspore_params(ms_resnet50(num_classes=10))
print("="*20)
```
得到结果:
```text
conv1.weight (64, 3, 7, 7)
bn1.moving_mean (64,)
bn1.moving_variance (64,)
bn1.gamma (64,)
bn1.beta (64,)
layer1.0.conv1.weight (64, 64, 1, 1)
```
|
| PyTorch | MindSpore |
```python
import torch
import torchvision.transforms as trans
import torchvision
import torch.nn.functional as F
from resnet import resnet50
def test_epoch(model, device, data_loader):
model.eval()
test_loss = 0
correct = 0
with torch.no_grad():
for data, target in data_loader:
output = model(data.to(device))
# sum up batch loss
test_loss += F.nll_loss(output, target.to(device),
reduction='sum').item()
# get the index of the max log-probability
pred = output.max(1)
pred = pred[1]
correct += pred.eq(target.to(device)).sum().item()
test_loss /= len(data_loader.dataset)
print('\nLoss: {:.4f}, Accuracy: {:.0f}%\n'.format(
test_loss, 100. * correct / len(data_loader.dataset)))
use_cuda = torch.cuda.is_available()
device = torch.device("cuda" if use_cuda else "cpu")
test_transform = trans.Compose([
trans.Resize(224),
trans.RandomHorizontalFlip(0.5),
trans.ToTensor(),
trans.Normalize([0.4914, 0.4822, 0.4465],
[0.2023, 0.1994, 0.2010]),
])
test_set = torchvision.datasets.CIFAR10(
root='./data', train=False, transform=test_transform)
test_loader = torch.utils.data.DataLoader(
test_set, batch_size=1, shuffle=False)
# 2. define forward network
if use_cuda:
net = resnet50(num_classes=10).cuda()
else:
resnet50(num_classes=10)
net.load_state_dict(torch.load("./resnet.pth", map_location='cpu'))
test_epoch(net, device, test_loader)
```
|
```python
import numpy as np
import mindspore as ms
from mindspore import nn
from src.dataset import create_dataset
from src.model_utils.moxing_adapter import moxing_wrapper
from src.model_utils.config import config
from src.utils import init_env
from src.resnet import resnet50
def test_epoch(model, data_loader, loss_func):
model.set_train(False)
test_loss = 0
correct = 0
for data, target in data_loader:
output = model(data)
test_loss += float(loss_func(output, target).asnumpy())
pred = np.argmax(output.asnumpy(), axis=1)
correct += (pred == target.asnumpy()).sum()
dataset_size = data_loader.get_dataset_size()
test_loss /= dataset_size
print('\nLoss: {:.4f}, Accuracy: {:.0f}%\n'.format(
test_loss, 100. * correct / dataset_size))
@moxing_wrapper()
def test_net():
init_env(config)
eval_dataset = create_dataset(
config.dataset_name,
config.data_path,
False, batch_size=1,
image_size=(int(config.image_height),
int(config.image_width)))
resnet = resnet50(num_classes=config.class_num)
ms.load_checkpoint(config.checkpoint_path, resnet)
loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True,
reduction='mean')
test_epoch(resnet, eval_dataset, loss)
if __name__ == '__main__':
test_net()
```
|
执行 ```shell python test.py --data_path data/cifar10/ --checkpoint_path resnet.ckpt ``` |
|
得到推理精度结果: ```text Loss: -9.7075, Accuracy: 91% ``` |
得到推理精度结果: ```text run standalone! Loss: 0.3240, Accuracy: 91% ``` |