mindspore_gl.dataset.Reddit

class mindspore_gl.dataset.Reddit(root)[source]

Reddit Dataset, a source dataset for reading and parsing Reddit dataset.

About Reddit dataset:

The node label is the community, or “subreddit”, that a post belongs to. The authors sampled 50 large communities and built a post-to-post graph, connecting posts if the same user comments on both. In total this dataset contains 232,965 posts with an average degree of 492. We use the first 20 days for training and the remaining days for testing (with 30% used for validation).

Statistics:

  • Nodes: 232,965

  • Edges: 114,615,892

  • Number of classes: 41

Dataset can be download here: Reddit .

You can organize the dataset files into the following directory structure and read.

.
├── reddit_data.npz
└── reddit_graph.npz
Parameters

root (str) – path to the root directory that contains reddit_with_mask.npz

Raises

Examples

>>> from mindspore_gl.dataset import Reddit
>>> root = "path/to/reddit"
>>> dataset = Reddit(root)
property adj_coo

Return the adjacency matrix of COO representation

Returns

  • numpy.ndarray, array of COO matrix.

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.adj_coo
property adj_csr

Return the adjacency matrix of CSR representation.

Returns

  • numpy.ndarray, array of CSR matrix.

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.adj_csr
property edge_count

Number of edges, length of CSR col.

Returns

  • int, the number of edges.

Examples

>>> #dataset is an instance object of Dataset
>>> edge_count = dataset.edge_count
property node_count

Number of nodes, length of CSR row.

Returns

  • int, the number of nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> node_count = dataset.node_count
property node_feat

Node features.

Returns

  • numpy.ndarray, array of node feature.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat = dataset.node_feat
property node_feat_size

Feature size of each node.

Returns

  • int, the number of feature size.

Examples

>>> #dataset is an instance object of Dataset
>>> node_feat_size = dataset.node_feat_size
property node_label

Ground truth labels of each node.

Returns

  • numpy.ndarray, array of node label.

Examples

>>> #dataset is an instance object of Dataset
>>> node_label = dataset.node_label
property num_classes

Number of label classes.

Returns

  • int, the number of classes.

Examples

>>> #dataset is an instance object of Dataset
>>> num_classes = dataset.num_classes
property test_mask

Mask of test nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> test_mask = dataset.test_mask
property test_nodes

Test nodes indexes.

Returns

  • numpy.ndarray, array of test nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> test_nodes = dataset.test_nodes
property train_mask

Mask of training nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> train_mask = dataset.train_mask
property train_nodes

Training nodes indexes.

Returns

  • numpy.ndarray, array of training nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> train_nodes = dataset.train_nodes
property val_mask

Mask of validation nodes.

Returns

  • numpy.ndarray, array of mask.

Examples

>>> #dataset is an instance object of Dataset
>>> val_mask = dataset.val_mask
property val_nodes

Val nodes indexes.

Returns

  • numpy.ndarray, array of val nodes.

Examples

>>> #dataset is an instance object of Dataset
>>> val_nodes = dataset.val_nodes