# 比较与torch.utils.data.SubsetRandomSampler的差异

[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/br_base/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/br_base/docs/mindspore/source_zh_cn/note/api_mapping/pytorch_diff/SubsetRandomSampler.md)

## torch.utils.data.SubsetRandomSampler

```python
class torch.utils.data.SubsetRandomSampler(indices, generator=None)
```

更多内容详见[torch.utils.data.SubsetRandomSampler](https://pytorch.org/docs/1.8.1/data.html#torch.utils.data.SubsetRandomSampler)。

## mindspore.dataset.SubsetRandomSampler

```python
class mindspore.dataset.SubsetRandomSampler(indices, num_samples=None)
```

更多内容详见[mindspore.dataset.SubsetRandomSampler](https://mindspore.cn/docs/zh-CN/br_base/api_python/dataset/mindspore.dataset.SubsetRandomSampler.html)。

## 差异对比

PyTorch：给定样本的索引序列，从序列中随机获取索引对数据集进行采样，支持指定采样逻辑。

MindSpore：给定样本的索引序列，从序列中随机获取索引对数据集进行采样，不支持指定采样逻辑。

| 分类 | 子类 |PyTorch | MindSpore | 差异 |
| --- | ---   | ---   | ---        |---  |
|参数 | 参数1 | indices | indices    | - |
|     | 参数2 | generator  | -      | 指定额外的采样逻辑，MindSpore为全局随机采样 |
|     | 参数3 | -   | num_samples  | 指定采样器返回的样本数量 |

## 代码示例

```python
import torch
from torch.utils.data import SubsetRandomSampler

torch.manual_seed(0)

class MyMapDataset(torch.utils.data.Dataset):
    def __init__(self):
        super(MyMapDataset).__init__()
        self.data = [i for i in range(4)]
    def __getitem__(self, index):
        return self.data[index]
    def __len__(self):
        return len(self.data)

ds = MyMapDataset()
sampler = SubsetRandomSampler(indices=[0, 2])
dataloader = torch.utils.data.DataLoader(ds, sampler=sampler)

for data in dataloader:
    print(data)
# Out:
# tensor([2])
# tensor([0])
```

```python
import mindspore as ms
from mindspore.dataset import SubsetRandomSampler

ms.dataset.config.set_seed(1)

class MyMapDataset():
    def __init__(self):
        super(MyMapDataset).__init__()
        self.data = [i for i in range(4)]
    def __getitem__(self, index):
        return self.data[index]
    def __len__(self):
        return len(self.data)

ds = MyMapDataset()
sampler = SubsetRandomSampler(indices=[0, 2])
dataloader = ms.dataset.GeneratorDataset(ds, column_names=["data"], sampler=sampler)

for data in dataloader:
    print(data)
# Out:
# [Tensor(shape=[], dtype=Int64, value= 2)]
# [Tensor(shape=[], dtype=Int64, value= 0)]
```