mindspore_gl.dataset.Reddit

class mindspore_gl.dataset.Reddit(root)[source]

Reddit Dataset, a source dataset for reading and parsing Reddit dataset.

About Reddit dataset:

The node label is the community, or “subreddit”, that a post belongs to. The authors sampled 50 large communities and built a post-to-post graph, connecting posts if the same user comments on both. In total this dataset contains 232,965 posts with an average degree of 492. We use the first 20 days for training and the remaining days for testing (with 30% used for validation).

Statistics:

Nodes: 232,965
Edges: 114,615,892
Number of classes: 41

Dataset can be download here: Reddit .

You can organize the dataset files into the following directory structure and read.

.
├── reddit_data.npz
└── reddit_graph.npz

Parameters

root (str) – path to the root directory that contains reddit_with_mask.npz

Raises

TypeError – if root is not a str.
RuntimeError – if root does not contain data files.

Examples

>>> from mindspore_gl.dataset import Reddit
>>> root = "path/to/reddit"
>>> dataset = Reddit(root)

property adj_coo

Return the adjacency matrix of COO representation

Returns

numpy.ndarray, array of COO matrix.

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.adj_coo

property adj_csr

Return the adjacency matrix of CSR representation.

Returns

numpy.ndarray, array of CSR matrix.

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.adj_csr

property edge_count

Number of edges, length of CSR col.

Returns

int, the number of edges.

Examples

>>> #dataset is an instance object of Dataset
>>> edge_count = dataset.edge_count

property node_count

Number of nodes, length of CSR row.

Returns

int, the number of nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> node_count = dataset.node_count

property node_feat

Node features.

Returns

numpy.ndarray, array of node feature.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat = dataset.node_feat

property node_feat_size

Feature size of each node.

Returns

int, the number of feature size.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat_size = dataset.node_feat_size

property node_label

Ground truth labels of each node.

Returns

numpy.ndarray, array of node label.

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.node_label

property num_classes

Number of label classes.

Returns

int, the number of classes.

Examples

>>> #dataset is an instance object of Dataset
>>> num_classes = dataset.num_classes

property test_mask

Mask of test nodes.

Returns

numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> test_mask = dataset.test_mask

property test_nodes

Test nodes indexes.

Returns

numpy.ndarray, array of test nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> test_nodes = dataset.test_nodes

property train_mask

Mask of training nodes.

Returns

numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> train_mask = dataset.train_mask

property train_nodes

Training nodes indexes.

Returns

numpy.ndarray, array of training nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> train_nodes = dataset.train_nodes

property val_mask

Mask of validation nodes.

Returns

numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> val_mask = dataset.val_mask

property val_nodes

Val nodes indexes.

Returns

numpy.ndarray, array of val nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> val_nodes = dataset.val_nodes