mindspore.dataset.Graph

class mindspore.dataset.Graph(edges, node_feat=None, edge_feat=None, graph_feat=None, node_type=None, edge_type=None, num_parallel_workers=None, working_mode='local', hostname='127.0.0.1', port=50051, num_client=1, auto_shutdown=True)[source]

A graph object for storing Graph structure and feature data, and provide capabilities such as graph sampling.

This class supports init graph With input numpy array data, which represent node, edge and its features. If working mode is local , there is no need to specify input arguments like working_mode , hostname , port , num_client , auto_shutdown .

Parameters
  • edges (Union[list, numpy.ndarray]) – edges of graph in COO format with shape [2, num_edges].

  • node_feat (dict, optional) – feature of nodes, input data format should be dict, key is feature type, which is represented with string like ‘weight’ etc, value should be numpy.array with shape [num_nodes, num_node_features].

  • edge_feat (dict, optional) – feature of edges, input data format should be dict, key is feature type, which is represented with string like ‘weight’ etc, value should be numpy.array with shape [num_edges, num_edge_features].

  • graph_feat (dict, optional) – additional feature, which can not be assigned to node_feat or edge_feat, input data format should be dict, key is feature type, which is represented with string, value should be numpy.array, its shape is not restricted.

  • node_type (Union[list, numpy.ndarray], optional) – type of nodes, each element should be string which represent type of corresponding node. If not provided, default type for each node is “0”.

  • edge_type (Union[list, numpy.ndarray], optional) – type of edges, each element should be string which represent type of corresponding edge. If not provided, default type for each edge is “0”.

  • num_parallel_workers (int, optional) – Number of workers to process the dataset in parallel. Default: None.

  • working_mode (str, optional) –

    Set working mode, now supports ‘local’/’client’/’server’. Default: ‘local’.

    • ’local’, used in non-distributed training scenarios.

    • ’client’, used in distributed training scenarios. The client does not load data, but obtains data from the server.

    • ’server’, used in distributed training scenarios. The server loads the data and is available to the client.

  • hostname (str, optional) – Hostname of the graph data server. This parameter is only valid when working_mode is set to ‘client’ or ‘server’. Default: ‘127.0.0.1’.

  • port (int, optional) – Port of the graph data server. The range is 1024-65535. This parameter is only valid when working_mode is set to ‘client’ or ‘server’. Default: 50051.

  • num_client (int, optional) – Maximum number of clients expected to connect to the server. The server will allocate resources according to this parameter. This parameter is only valid when working_mode is set to ‘server’. Default: 1.

  • auto_shutdown (bool, optional) – Valid when working_mode is set to ‘server’, when the number of connected clients reaches num_client and no client is being connected, the server automatically exits. Default: True.

Raises
  • TypeError – If edges not list or NumPy array.

  • TypeError – If node_feat provided but not dict, or key in dict is not string type, or value in dict not NumPy array.

  • TypeError – If edge_feat provided but not dict, or key in dict is not string type, or value in dict not NumPy array.

  • TypeError – If graph_feat provided but not dict, or key in dict is not string type, or value in dict not NumPy array.

  • TypeError – If node_type provided but its type not list or NumPy array.

  • TypeError – If edge_type provided but its type not list or NumPy array.

  • ValueError – If num_parallel_workers exceeds the max thread numbers.

  • ValueError – If working_mode is not ‘local’, ‘client’ or ‘server’.

  • TypeError – If hostname is illegal.

  • ValueError – If port is not in range [1024, 65535].

  • ValueError – If num_client is not in range [1, 255].

Examples

>>> import numpy as np
>>> from mindspore.dataset import Graph
>>>
>>> # 1) Only provide edges for creating graph, as this is the only required input parameter
>>> edges = np.array([[1, 2], [0, 1]], dtype=np.int32)
>>> graph = Graph(edges)
>>> graph_info = graph.graph_info()
>>>
>>> # 2) Setting node_feat and edge_feat for corresponding node and edge
>>> #    first dimension of feature shape should be corresponding node num or edge num.
>>> edges = np.array([[1, 2], [0, 1]], dtype=np.int32)
>>> node_feat = {"node_feature_1": np.array([[0], [1], [2]], dtype=np.int32)}
>>> edge_feat = {"edge_feature_1": np.array([[1, 2], [3, 4]], dtype=np.int32)}
>>> graph = Graph(edges, node_feat, edge_feat)
>>>
>>> # 3) Setting graph feature for graph, there is no shape limit for graph feature
>>> edges = np.array([[1, 2], [0, 1]], dtype=np.int32)
>>> graph_feature = {"graph_feature_1": np.array([1, 2, 3, 4, 5, 6], dtype=np.int32)}
>>> graph = Graph(edges, graph_feat=graph_feature)
get_all_edges(edge_type)[source]

Get all edges in the graph.

Parameters

edge_type (str) – Specify the type of edge, default edge_type is “0” when init graph without specify edge_type.

Returns

numpy.ndarray, array of edges.

Examples

>>> edges = graph.get_all_edges(edge_type="0")
Raises

TypeError – If edge_type is not string.

get_all_neighbors(node_list, neighbor_type, output_format=OutputFormat.NORMAL)[source]

Get neighbor_type neighbors of the nodes in node_list . We try to use the following example to illustrate the definition of these formats. 1 represents connected between two nodes, and 0 represents not connected.

Adjacent Matrix

0

1

2

3

0

0

1

0

0

1

0

0

1

0

2

1

0

0

1

3

1

0

0

0

Normal Format

src

0

1

2

3

dst_0

1

2

0

1

dst_1

-1

-1

3

-1

COO Format

src

0

1

2

2

3

dst

1

2

0

3

1

CSR Format

offsetTable

0

1

2

4

dstTable

1

2

0

3

1

Parameters
  • node_list (Union[list, numpy.ndarray]) – The given list of nodes.

  • neighbor_type (str) – Specify the type of neighbor node.

  • output_format (OutputFormat, optional) – Output storage format. Default: OutputFormat.NORMAL. It can be any of [OutputFormat.NORMAL, OutputFormat.COO, OutputFormat.CSR].

Returns

For NORMAL format or COO format numpy.ndarray which represents the array of neighbors will return. As if CSR format is specified, two numpy.ndarrays will return. The first one is offset table, the second one is neighbors

Examples

>>> from mindspore.dataset.engine import OutputFormat
>>> nodes = graph.get_all_nodes(node_type="0")
>>> neighbors = graph.get_all_neighbors(node_list=nodes, neighbor_type="0")
>>> neighbors_coo = graph.get_all_neighbors(node_list=nodes, neighbor_type="0",
...                                         output_format=OutputFormat.COO)
>>> offset_table, neighbors_csr = graph.get_all_neighbors(node_list=nodes, neighbor_type="0",
...                                                       output_format=OutputFormat.CSR)
Raises
  • TypeError – If node_list is not list or ndarray.

  • TypeError – If neighbor_type is not string.

get_all_nodes(node_type)[source]

Get all nodes in the graph.

Parameters

node_type (str) – Specify the type of node.

Returns

numpy.ndarray, array of nodes.

Examples

>>> nodes = graph.get_all_nodes(node_type="0")
Raises

TypeError – If node_type is not string.

get_edge_feature(edge_list, feature_types)[source]

Get feature_types feature of the edges in edge_list .

Parameters
  • edge_list (Union[list, numpy.ndarray]) – The given list of edges.

  • feature_types (Union[list, numpy.ndarray]) – The given list of feature types, each element should be string.

Returns

numpy.ndarray, array of features.

Examples

>>> edges = graph.get_all_edges(edge_type="0")
>>> features = graph.get_edge_feature(edge_list=edges, feature_types=["edge_feature_1"])
Raises
  • TypeError – If edge_list is not list or ndarray.

  • TypeError – If feature_types is not list or ndarray.

get_edges_from_nodes(node_list)

Get edges from the nodes.

Parameters

node_list (Union[list[tuple], numpy.ndarray]) – The given list of pair nodes ID.

Returns

numpy.ndarray, array of edges ID.

Examples

>>> edges = graph_data.get_edges_from_nodes(node_list=[(101, 201), (103, 207)])
Raises

TypeError – If edge_list is not list or ndarray.

get_graph_feature(feature_types)[source]

Get feature_types feature that stored in Graph feature level.

Parameters

feature_types (Union[list, numpy.ndarray]) – The given list of feature types, each element should be string.

Returns

numpy.ndarray, array of features.

Examples

>>> features = graph.get_graph_feature(feature_types=['graph_feature_1'])
Raises

TypeError – If feature_types is not list or ndarray.

get_neg_sampled_neighbors(node_list, neg_neighbor_num, neg_neighbor_type)[source]

Get neg_neighbor_type negative sampled neighbors of the nodes in node_list .

Parameters
  • node_list (Union[list, numpy.ndarray]) – The given list of nodes.

  • neg_neighbor_num (int) – Number of neighbors sampled.

  • neg_neighbor_type (str) – Specify the type of negative neighbor.

Returns

numpy.ndarray, array of neighbors.

Examples

>>> nodes = graph.get_all_nodes(node_type="0")
>>> neg_neighbors = graph.get_neg_sampled_neighbors(node_list=nodes, neg_neighbor_num=3,
...                                                 neg_neighbor_type="0")
Raises
  • TypeError – If node_list is not list or ndarray.

  • TypeError – If neg_neighbor_num is not integer.

  • TypeError – If neg_neighbor_type is not string.

get_node_feature(node_list, feature_types)[source]

Get feature_types feature of the nodes in node_list .

Parameters
  • node_list (Union[list, numpy.ndarray]) – The given list of nodes.

  • feature_types (Union[list, numpy.ndarray]) – The given list of feature types, each element should be string.

Returns

numpy.ndarray, array of features.

Examples

>>> nodes = graph.get_all_nodes(node_type="0")
>>> features = graph.get_node_feature(node_list=nodes, feature_types=["node_feature_1"])
Raises
  • TypeError – If node_list is not list or ndarray.

  • TypeError – If feature_types is not list or ndarray.

get_nodes_from_edges(edge_list)

Get nodes from the edges.

Parameters

edge_list (Union[list, numpy.ndarray]) – The given list of edges.

Returns

numpy.ndarray, array of nodes.

Examples

>>> from mindspore.dataset import GraphData
>>>
>>> g = ds.GraphData("/path/to/testdata", 1)
>>> edges = g.get_all_edges(0)
>>> nodes = g.get_nodes_from_edges(edges)
Raises

TypeError – If edge_list is not list or ndarray.

get_sampled_neighbors(node_list, neighbor_nums, neighbor_types, strategy=SamplingStrategy.RANDOM)[source]

Get sampled neighbor information.

The api supports multi-hop neighbor sampling. That is, the previous sampling result is used as the input of next-hop sampling. A maximum of 6-hop are allowed.

The sampling result is tiled into a list in the format of [input node, 1-hop sampling result, 2-hop sampling result …].

Parameters
  • node_list (Union[list, numpy.ndarray]) – The given list of nodes.

  • neighbor_nums (Union[list, numpy.ndarray]) – Number of neighbors sampled per hop.

  • neighbor_types (Union[list, numpy.ndarray]) – Neighbor type sampled per hop, type of each element in neighbor_types should be str.

  • strategy (SamplingStrategy, optional) –

    Sampling strategy. Default: SamplingStrategy.RANDOM. It can be any of [SamplingStrategy.RANDOM, SamplingStrategy.EDGE_WEIGHT].

    • SamplingStrategy.RANDOM, random sampling with replacement.

    • SamplingStrategy.EDGE_WEIGHT, sampling with edge weight as probability.

Returns

numpy.ndarray, array of neighbors.

Examples

>>> nodes = graph.get_all_nodes(node_type="0")
>>> neighbors = graph.get_sampled_neighbors(node_list=nodes, neighbor_nums=[2, 2],
...                                         neighbor_types=["0", "0"])
Raises
  • TypeError – If node_list is not list or ndarray.

  • TypeError – If neighbor_nums is not list or ndarray.

  • TypeError – If neighbor_types is not list or ndarray.

graph_info()[source]

Get the meta information of the graph, including the number of nodes, the type of nodes, the feature information of nodes, the number of edges, the type of edges, and the feature information of edges.

Returns

dict, meta information of the graph. The key is node_type, edge_type, node_num, edge_num, node_feature_type, edge_feature_type and graph_feature_type.

random_walk(target_nodes, meta_path, step_home_param=1.0, step_away_param=1.0, default_node=- 1)

Random walk in nodes.

Parameters
  • target_nodes (list[int]) – Start node list in random walk

  • meta_path (list[int]) – node type for each walk step

  • step_home_param (float, optional) – return hyper parameter in node2vec algorithm. Default: 1.0.

  • step_away_param (float, optional) – in out hyper parameter in node2vec algorithm. Default: 1.0.

  • default_node (int, optional) – default node if no more neighbors found. Default: -1. A default value of -1 indicates that no node is given.

Returns

numpy.ndarray, array of nodes.

Examples

>>> nodes = graph_data.get_all_nodes(node_type=1)
>>> walks = graph_data.random_walk(target_nodes=nodes, meta_path=[2, 1, 2])
Raises
  • TypeError – If target_nodes is not list or ndarray.

  • TypeError – If meta_path is not list or ndarray.