mindspore_gl.dataset.Reddit
- class mindspore_gl.dataset.Reddit(root)[source]
Reddit Dataset, a source dataset for reading and parsing Reddit dataset.
About Reddit dataset:
The node label is the community, or “subreddit”, that a post belongs to. The authors sampled 50 large communities and built a post-to-post graph, connecting posts if the same user comments on both. In total this dataset contains 232,965 posts with an average degree of 492. We use the first 20 days for training and the remaining days for testing (with 30% used for validation).
Statistics:
Nodes: 232,965
Edges: 114,615,892
Number of classes: 41
Dataset can be download here: Reddit .
You can organize the dataset files into the following directory structure and read.
. ├── reddit_data.npz └── reddit_graph.npz
- Parameters
root (str) – path to the root directory that contains reddit_with_mask.npz
- Raises
TypeError – if root is not a str.
RuntimeError – if root does not contain data files.
Examples
>>> from mindspore_gl.dataset import Reddit >>> root = "path/to/reddit" >>> dataset = Reddit(root)
- property adj_coo
Return the adjacency matrix of COO representation
- Returns
numpy.ndarray, array of COO matrix.
Examples
>>> #dataset is an instance object of Dataset >>> node_label = dataset.adj_coo
- property adj_csr
Return the adjacency matrix of CSR representation.
- Returns
numpy.ndarray, array of CSR matrix.
Examples
>>> #dataset is an instance object of Dataset >>> node_label = dataset.adj_csr
- property edge_count
Number of edges, length of CSR col.
- Returns
int, the number of edges.
Examples
>>> #dataset is an instance object of Dataset >>> edge_count = dataset.edge_count
- property node_count
Number of nodes, length of CSR row.
- Returns
int, the number of nodes.
Examples
>>> #dataset is an instance object of Dataset >>> node_count = dataset.node_count
- property node_feat
Node features.
- Returns
numpy.ndarray, array of node feature.
Examples
>>> #dataset is an instance object of Dataset >>> node_feat = dataset.node_feat
- property node_feat_size
Feature size of each node.
- Returns
int, the number of feature size.
Examples
>>> #dataset is an instance object of Dataset >>> node_feat_size = dataset.node_feat_size
- property node_label
Ground truth labels of each node.
- Returns
numpy.ndarray, array of node label.
Examples
>>> #dataset is an instance object of Dataset >>> node_label = dataset.node_label
- property num_classes
Number of label classes.
- Returns
int, the number of classes.
Examples
>>> #dataset is an instance object of Dataset >>> num_classes = dataset.num_classes
- property test_mask
Mask of test nodes.
- Returns
numpy.ndarray, array of mask.
Examples
>>> #dataset is an instance object of Dataset >>> test_mask = dataset.test_mask
- property test_nodes
Test nodes indexes.
- Returns
numpy.ndarray, array of test nodes.
Examples
>>> #dataset is an instance object of Dataset >>> test_nodes = dataset.test_nodes
- property train_mask
Mask of training nodes.
- Returns
numpy.ndarray, array of mask.
Examples
>>> #dataset is an instance object of Dataset >>> train_mask = dataset.train_mask
- property train_nodes
Training nodes indexes.
- Returns
numpy.ndarray, array of training nodes.
Examples
>>> #dataset is an instance object of Dataset >>> train_nodes = dataset.train_nodes
- property val_mask
Mask of validation nodes.
- Returns
numpy.ndarray, array of mask.
Examples
>>> #dataset is an instance object of Dataset >>> val_mask = dataset.val_mask
- property val_nodes
Val nodes indexes.
- Returns
numpy.ndarray, array of val nodes.
Examples
>>> #dataset is an instance object of Dataset >>> val_nodes = dataset.val_nodes