mindinsight.debugger

MindSpore Debugger is a debugging tool for training in Graph Mode. It can be applied to visualize and analyze the intermediate computation results of the computational graph. In Graph Mode training, the computation results of intermediate nodes in the computational graph can not be acquired conveniently, which makes it difficult for users to do the debugging.

By applying MindSpore Debugger, users can:Visualize the computational graph on the UI and analyze the output of the graph node.Set watchpoints to monitor training exceptions (for example, tensor overflow) and trace error causes. Visualize and analyze the change of parameters, such as weights.Visualize the nodes and code mapping relationship.

Debugger API is a python API interface provided for offline debugger. You need to save dump data before using it. For the method of saving dump data, refer to Using Dump in the Graph Mode .

class mindinsight.debugger.ConditionBase[source]

Base class for watch conditions.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Note

  • If multiple checking parameters are specified for one condition instance, a WatchpointHit happens for the parameters that the tensor triggered for the watchpoint.

Supported Platforms:

Ascend GPU

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> from mindinsight.debugger import (TensorTooLargeCondition,
...                                   Watchpoint)
>>>
>>> def test_condition_base():
...     my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
...     tensors = my_run.select_tensors(query_string="Conv2D-op13")
...     watchpoint = Watchpoint(tensors=tensors,
...                             condition=TensorTooLargeCondition(abs_mean_gt=0.0, max_gt=0.0))
...     hit = list(my_run.check_watchpoints(watchpoints=[watchpoint]))[0]
...     # print(hit.get_hit_detail())
...     # the print result is as follows
...     # The setting for watchpoint is abs_mean_gt = 0.0, max_gt = 0.0.
...     # The actual value of the tensor is abs_mean_gt = 0.06592023578438996, max_gt = 0.449951171875.
...     watchpoint = Watchpoint(tensors=tensors,
...                             condition=TensorTooLargeCondition(abs_mean_gt=0.0, max_gt=1.0))
...     # the check_watchpoints function start a new process needs to be called through the main entry
...     hit = list(my_run.check_watchpoints(watchpoints=[watchpoint]))[0]
...     # print(hit.get_hit_detail())
...     # the print result is as follows
...     # The setting for watchpoint is abs_mean_gt = 0.0.
...     # The actual value of the tensor is abs_mean_gt = 0.06592023578438996.
...
>>> if __name__ == "__main__":
...     test_condition_base()
...
property condition_id

Get the name for the watch condition Id.

Returns

int, the id of the watch condition.

property name

Get the id for watch condition.

Returns

str, the name of the watch condition.

property param_dict

Get the parameters list.

Returns

dict, the parameter dict of the watch condition.

class mindinsight.debugger.DebuggerTensor(node, slot, iteration)[source]

The tensor with specific rank, iteration and debugging info.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters
  • node (Node) – The node that outputs this tensor.

  • slot (int) – The slot of the tensor on the node.

  • iteration (int) – The iteration of the tensor.

Note

  • Users should not instantiate this class manually.

  • The instances of this class is immutable.

  • A DebuggerTensor is always the output tensor of a node.

property iteration

Get iteration of the tensor.

Returns

int, the iteration of the tensor.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensors = list(my_run.select_tensors("conv"))
>>> print(tensors[0].iteration)
0
property node

Get the node that outputs this tensor.

Returns

Node, the node that outputs this tensor.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensors = list(my_run.select_tensors("conv"))
>>> print(tensors[0].node)
rank: 0
graph_name: kernel_graph_0
node_name: conv1.weight
property rank

The rank is the logical id of the device on which the tensor is generated.

Returns

int, the rank for this tensor.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensors = list(my_run.select_tensors("conv"))
>>> print(tensors[0].rank)
0
property slot

The output of the node may have several tensors. The slot refer to the index of the tensor

Returns

int, the slot of the tensor on the node.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensors = list(my_run.select_tensors("conv"))
>>> print(tensors[0].slot)
0
value()[source]

Get the value of the tensor.

Returns

Union[numpy.array, None], The value could be None if failed to find data file in relative iteration.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>>
>>> def test_debugger_tensor():
...     my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
...     tensors = list(my_run.select_tensors("conv"))
...     # the tensors[0].value() maybe start the new process
...     value = tensors[0].value()
...     return value
...
>>> if __name__ == "__main__":
...     test_debugger_tensor()
...
class mindinsight.debugger.DumpAnalyzer(dump_dir, mem_limit=None)[source]

Analyzer to inspect the dump data.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters
  • dump_dir (str) – The path of the dump folder.

  • mem_limit (int, optional) – The memory limit for checking watchpoints in MB. Optional values: from 2048 MB to 2147483647 MB. None means no limit is set, only limited by computor memory. Default: None.

Supported Platforms:

Ascend GPU

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
check_watchpoints(watchpoints, error_on_no_value=False)[source]

Check the given watchpoints in a batch.

Note

1. For speed, all watchpoints for the iteration should be given at the same time to avoid reading tensors len(watchpoints) times.

2. The check_watchpoints function start a new process when it is called, needs to be called in if __name__ == ‘__main__’ .

Parameters
  • watchpoints (Iterable[Watchpoint]) – The list of watchpoints.

  • error_on_no_value (bool, optional) – Whether to report error code in watchpoint hit when the specified tensor have no value stored in dump_dir. Default: False.

Returns

Iterable[WatchpointHit], the watchpoint hit list, sorted by tensor drop time.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> from mindinsight.debugger import (TensorTooLargeCondition,
...                                    Watchpoint)
>>>
>>> def test_watchpoints():
...     my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
...     tensors = my_run.select_tensors(
...                                         query_string="Conv2D-op13",
...                                         use_regex=True,
...                                         iterations=[0],
...                                         ranks=[0],
...                                         slots=[0]
...                                         )
...     watchpoint = Watchpoint(tensors=tensors,
...                             condition=TensorTooLargeCondition(abs_mean_gt=0.0))
...     # the check_watchpoints function start a new process needs to be called through the main entry
...     hit = list(my_run.check_watchpoints(watchpoints=[watchpoint]))[0]
...     # print(str(hit))
...     # the print result is as follows
...     # Watchpoint TensorTooLarge triggered on tensor:
...     # rank: 0
...     # graph_name: kernel_graph_0
...     # node_name: Default/network-WithLossCell/_backbone-AlexNet/conv2-Conv2d/Conv2D-op13
...     # slot: 0
...     # iteration: 0
...     # Threshold: {'abs_mean_gt': 0.0}
...     # Hit detail: the setting for watchpoint is abs_mean_gt = 0.0.
...     # The actual value of the tensor is abs_mean_gt = 0.06592023578438996.
...
>>> if __name__ == "__main__":
...     test_watchpoints()
...
export_graphs(output_dir=None)[source]

Export the computational graph(s) in xlsx file(s) to the output_dir .

The file(s) will contain the stack info of graph nodes.

Parameters

output_dir (str, optional) – Output directory to save the file. None means to use the current working directory. Default: None.

Returns

str, The path of the generated file.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> res = my_run.export_graphs()
get_input_nodes(node)[source]

Get the input nodes of the given node.

Parameters

node (Node) – The node of which input nodes will be returned.

Returns

Iterable[Node], the input nodes of the specified node.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node_list = list(my_run.select_nodes(query_string="Conv2D-op13"))
>>> input_nodes = my_run.get_input_nodes(node_list[0])
get_iterations(ranks=None)[source]

Get available iterations which have data dumped in this run.

Parameters

ranks (Union[int, list[int], None], optional) – The rank(s) to select. Get available iterations which are under the specified ranks. The ranks refers to the number of devices to be used starting from 0 when running distributed training. This number is called rank. For example, for an 8-card computer, only 4-7 cards are used for specified training, so 4-7 cards correspond to the ranks 0-3 respectively.. If None, return iterations of all ranks. Default: None.

Returns

Iterable[int], available iterations which have dumped data, sorted in increasing order.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> iterations = my_run.get_iterations()
>>> print(list(iterations))
[0]
get_output_nodes(node)[source]

Get the nodes that use the output tensors of the given node.

Parameters

node (Node) – The specified node.

Returns

Iterable[Node], output nodes of the specified node.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node_list = list(my_run.select_nodes(query_string="Conv2D-op13"))
>>> out_nodes = my_run.get_output_nodes(node_list[0])
get_ranks()[source]

Get the available ranks in this run.

Returns

Iterable[int], the list of rank id in current dump directory.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> ranks = my_run.get_ranks()
>>> print(list(ranks))
[0]
list_affected_nodes(tensor)[source]

List the nodes that use given tensor as input.

Affected nodes is defined as the nodes use the given tensor as input. If a node is affected by the given tensor, the node’s output value is likely to change when the given tensor changes.

Parameters

tensor (DebuggerTensor) – The tensor of which affected nodes will be returned.

Returns

Iterable[Node], the affected nodes of the specified tensor.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensor_list = list(my_run.select_tensors(query_string="Conv2D-op13"))
>>> affected_nodes = my_run.list_affected_nodes(tensor_list[0])
select_nodes(query_string, use_regex=False, select_by='node_name', ranks=None, case_sensitive=True)[source]

Select nodes.

Select the matched nodes in the computational graph according to the query_string. The nodes can be matched by “node_name” or “code_stack”, see the parameters for detail.

Parameters
  • query_string (str) – Query string. For a node to be selected, the match target field must contains or matches the query string.

  • use_regex (bool, optional) – Indicates whether query is a regex. Default: False.

  • select_by (str, optional) – The field to search when selecting nodes. Available values are "node_name", "code_stack". "node_name" means to search the name of the nodes in the graph. "code_stack" means the stack info of the node. Default: "node_name".

  • ranks (Union[int, list[int], None], optional) – The rank(s) to select. None means all ranks will be considered. The selected nodes must exist on the specified ranks. Default: None.

  • case_sensitive (bool, optional) – Whether case-sensitive when selecting tensors. Default: True.

Returns

Iterable[Node], the matched nodes.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> nodes = my_run.select_nodes("Conv2D-op13")
select_tensor_statistics(iterations=None, ranks=None)[source]

Select tensor statistics.

Select the matched tensor statistics in the directory according to the sepicified filter condition, see the parameters for detail.

Parameters
  • iterations (Union[int, list[int], None], optional) – The iteration(s) to select. None means all dumped iterations will be selected. Default: None.

  • ranks (Union[int, list[int], None], optional) – The rank(s) to select. None means all ranks will be selected. Default: None.

Returns

Dict[TensorStatistic], the matched TensorStatistics. The format is as below.

{
"rank_id":
    {
    "iteration_id":
        {
        "tensor_name":
            [TensorStatistic],
        ...
        }
    }
...
}

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> statistics = my_run.select_tensor_statistics(ranks=[0])
select_tensors(query_string, use_regex=False, select_by='node_name', iterations=None, ranks=None, slots=None, case_sensitive=True)[source]

Select tensors.

Select the matched tensors in the directory according to the sepicified filter condition, see the parameters for detail.

Parameters
  • query_string (str) – Query string. For a tensor to be selected, the match target field must contain or match the query string.

  • use_regex (bool, optional) – Indicates whether query is a regex. Default: False.

  • select_by (str, optional) – The field to search when selecting tensors. Available values are "node_name", "code_stack". "node_name" means to search the node name of the tensors in the graph. "code_stack" means the stack info of the node that outputs this tensor. Default: "node_name".

  • iterations (Union[int, list[int], None], optional) – The iteration(s) to select. None means all dumped iterations will be selected. Default: None.

  • ranks (Union[int, list[int], None], optional) – The rank(s) to select. None means all ranks will be selected. Default: None.

  • slots (list[int], optional) – The slot of the selected tensor. None means all slots will be selected. Default: None.

  • case_sensitive (bool, optional) – Whether case-sensitive when selecting tensors. Default: True.

Returns

Iterable[DebuggerTensor], the matched tensors.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensors = my_run.select_tensors("Conv2D-op13")
summary_statistics(statistics, overflow_value=65500, out_path='./')[source]

Summary the statistics in the different ranks and iterations.

Parameters
  • statistics (Dict[TensorStatistic]) – The given TensorStatistic. They can be the return value of compute_statistic or select_tensor_statistics.

  • overflow_value (int, optional) – The given overflow threshold, default: 65500.

  • out_path (str, optional) – The given output directory to save the statistics. Default: "./".

class mindinsight.debugger.Node(node_feature)[source]

Node in the computational graph.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters

node_feature (namedtuple) –

The node feature, including the following information:

  • name (str): The node name.

  • rank (int): The rank id.

  • stack (iterable[dict]): The stack information. The format of each item is like:

    {
        'file_path': str,
        'line_no': int,
        'code_line': str
    }
    
  • graph_name (str): The graph name.

  • root_graph_id (int): The root graph id.

Note

  • Users should not instantiate this class manually.

  • The instances of this class is immutable.

get_input_tensors(iterations=None, slots=None)[source]

Get the input tensors of the node.

Parameters
  • iterations (Iterable[int], optional) – The iterations to which the returned tensor should belong. None means all available iterations will be considered. Default: None.

  • slots (Iterable[int], optional) – The slots in which the returned tensors should be. None means all available slots will be considered. Default: None.

Returns

Iterable[DebuggerTensor], the input tensors of the node.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node = list(my_run.select_nodes("Conv2D-op13"))[0]
>>> input_tensors = node.get_input_tensors(iterations=[0], slots=[0])
get_output_tensors(iterations=None, slots=None)[source]

Get the output tensors of this node.

Parameters
  • iterations (Iterable[int], optional) – The iterations to which the returned tensor should belong. None means all available iterations will be considered. Default: None.

  • slots (Iterable[int], optional) – The slots in which the returned tensors should be. None means all available slots will be considered. Default: None.

Returns

Iterable[DebuggerTensor], the output tensors of the node.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node = list(my_run.select_nodes("Conv2D-op13"))[0]
>>> output_tensors = node.get_output_tensors(iterations=[0], slots=[0])
property graph_name: str

Get graph name of current node.

Returns

str, the graph name.

property name

Get the full name of this node.

Returns

str, the full name of the node.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node = list(my_run.select_nodes("conv"))[0]
>>> print(node.name)
conv1.weight
property rank: int

Get rank info.

Returns

int, the rank id to which the node belong.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node = list(my_run.select_nodes("conv"))[0]
>>> print(node.rank)
0
property root_graph_id: int

Get the root graph id to which the dumped tensor of current node will belong.

Returns

int, the root graph id.

property stack

Get stack info of the node.

Returns

iterable[dict], each item format as follows,

{
    'file_path': str,
    'line_no': int,
    'code_line': str
}

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node = list(my_run.select_nodes("Conv2D-op13"))[0]
>>> # print(node.stack)
>>> # the print result is as follows
>>> # [{'file_path': '/path', 'line_no': 266, 'code_line': 'output = self.conv2d(x, self.weight)',
>>> # 'has_substack': False},
>>> # {'file_path': '/path', 'line_no': 55, 'code_line': 'x = self.conv2(x), 'has_substack': False}]
class mindinsight.debugger.OperatorOverflowCondition[source]

Operator overflow watch condition.

Operator overflow whatchpoint checks whether overflow occurs during operator computation. Only Ascend AI processor is supported.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Examples

>>> from mindinsight.debugger import OperatorOverflowCondition
>>> my_condition = OperatorOverflowCondition()
>>> print(my_condition.name)
OperatorOverflow
property param_dict

Get the parameters list.

Returns

dict, the parameter dict of the watch condition.

property param_names

Return the list of parameter names.

Returns

list[str], the parameter names.

class mindinsight.debugger.TensorAllZeroCondition(zero_percentage_ge)[source]

Watch condition for tensor value is all zero .

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters

zero_percentage_ge (float) – The threshold to check if the percentage of zero tensor values are greater than this value.

Examples

>>> from mindinsight.debugger import TensorAllZeroCondition
>>> my_condition = TensorAllZeroCondition(zero_percentage_ge=0.0)
>>> print(my_condition.name)
TensorAllZero
property param_names

Return the list of parameter names.

Returns

list[str], the parameter names.

class mindinsight.debugger.TensorChangeAboveThresholdCondition(abs_mean_update_ratio_gt, epsilon=1e-09)[source]

Watch condition for tensor changing above threshold.

When the tensor changing satisfies equation \(\frac {abs\_mean(current\_tensor - previous\_tensor)} {abs\_mean(previous\_tensor)} + epsilon > mean\_update\_ratio\_lt\) , the watchpoint would be hit.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters
  • abs_mean_update_ratio_gt (float) – The threshold value for mean update ratio, if the mean update ratio is greater than this value the watchpoint will be triggered.

  • epsilon (float, optional) – Epsilon value. Default: 1e-9.

Examples

>>> from mindinsight.debugger import TensorChangeAboveThresholdCondition
>>> my_condition = TensorChangeAboveThresholdCondition(abs_mean_update_ratio_gt=0.0)
>>> print(my_condition.name)
TensorChangeAboveThreshold
property param_names

Return the list of parameter names.

Returns

list[str], the parameter names.

class mindinsight.debugger.TensorChangeBelowThresholdCondition(abs_mean_update_ratio_lt, epsilon=1e-09)[source]

Watch condition for tensor changing below threshold.

When the tensor changing satisfies equation \(\frac {abs\_mean(current\_tensor - previous\_tensor)} {abs\_mean(previous\_tensor)} + epsilon < mean\_update\_ratio\_lt\) , the watchpoint would be hit.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters
  • abs_mean_update_ratio_lt (float) – The threshold value for mean update ration. If the mean update ratio is less that this value the watchpoint will be triggered.

  • epsilon (float, optional) – Epsilon value. Default: 1e-9.

Examples

>>> from mindinsight.debugger import TensorChangeBelowThresholdCondition
>>> my_condition = TensorChangeBelowThresholdCondition(abs_mean_update_ratio_lt=2.0)
>>> print(my_condition.name)
TensorChangeBelowThreshold
property param_names

Return the list of parameter names.

Returns

list[str], the parameter names.

class mindinsight.debugger.TensorOverflowCondition[source]

Watch condition for tensor overflow.

Tensor overflow whatchpoint checks for inf and nan tensors.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Examples

>>> from mindinsight.debugger import TensorOverflowCondition
>>> my_condition = TensorOverflowCondition()
>>> print(my_condition.name)
TensorOverflow
property param_dict

Get the parameters list.

Returns

dict, the parameter dict of the watch condition.

property param_names

Return the list of parameter names.

Returns

list[str], the parameter names.

class mindinsight.debugger.TensorRangeCondition(range_start_inclusive=None, range_end_inclusive=None, range_percentage_lt=None, range_percentage_gt=None, max_min_lt=None, max_min_gt=None)[source]

Watch condition for tensor value range.

Set a threshold to check the tensor value range. There are four options: range_percentage_lt , range_percentage_gt , max_min_lt and max_min_gt . At least one of the four options should be specified. If the threshold is set to one of the first two options, both range_start_inclusive and range_end_inclusive must be set. If multiple checking parameters are specified, a WatchpointHit happens for the parameters that the tensor triggered for the watchpoint.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters
  • range_start_inclusive (float, optional) – The start of the specified range. Default: None.

  • range_end_inclusive (float, optional) – The end of the specified range. Default: None.

  • range_percentage_lt (float, optional) – The threshold for the percentage of the tensor in the range [range_start_inclusive, range_end_inclusive] . The checking condition will be satisfied when the percentage of the tensor in the specified range is less than this value. Default: None.

  • range_percentage_gt (float, optional) – The threshold for the percentage of the tensor in the range [range_start_inclusive, range_end_inclusive] . The checking condition will be satisfied when the percentage of the tensor in the specified range is greater than this value. Default: None.

  • max_min_lt (float, optional) – Lowwer threshold for the difference between the maximum and minimum values of a tensor. Default: None.

  • max_min_gt (float, optional) – Upper threshold for the difference between the maximum and minimum values of a tensor. Default: None.

Examples

>>> from mindinsight.debugger import TensorRangeCondition
>>> my_condition = TensorRangeCondition(max_min_gt=0.05)
>>> print(my_condition.name)
TensorRange
property param_names

Return the list of parameter names.

Returns

list[str], the parameter names.

class mindinsight.debugger.TensorTooLargeCondition(abs_mean_gt=None, max_gt=None, min_gt=None, mean_gt=None)[source]

Watch contion for tensor value too large. At least one parameter should be specified.

If multiple checking parameters are specified, a WatchpointHit happens for the parameters that the tensor triggered for the watchpoint.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters
  • abs_mean_gt (float, optional) – The threshold for mean of the absolute value of the tensor. When the actual value was greater than this threshold, this checking condition would be satisfied. Default: None.

  • max_gt (float, optional) – The threshold for maximum of the tensor. When the actual value was greater than this threshold, this checking condition would be satisfied. Default: None.

  • min_gt (float, optional) – The threshold for minimum of the tensor. When the actual value was greater than this threshold, this checking condition would be satisfied. Default: None.

  • mean_gt (float, optional) – The threshold for mean of the tensor. When the actual value was greater than this threshold, this checking condition would be satisfied. Default: None.

Examples

>>> from mindinsight.debugger import TensorTooLargeCondition
>>> my_condition = TensorTooLargeCondition(abs_mean_gt=0.0)
>>> print(my_condition.name)
TensorTooLarge
property param_names

Return the list of parameter names.

Returns

list[str], the parameter names.

class mindinsight.debugger.TensorTooSmallCondition(abs_mean_lt=None, max_lt=None, min_lt=None, mean_lt=None)[source]

Watch contion for tensor value too small. At least one parameter should be specified.

If multiple checking parameters are specified, a WatchpointHit happens for the parameters that the tensor triggered for the watchpoint.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters
  • abs_mean_lt (float, optional) – The threshold for mean of the absolute value of the tensor. When the actual value was less than this threshold, this checking condition would be satisfied. Default: None.

  • max_lt (float, optional) – The threshold for maximum of the tensor. When the actual value was less than this threshold, this checking condition would be satisfied. Default: None.

  • min_lt (float, optional) – The threshold for minimum of the tensor. When the actual value was less than this threshold, this checking condition would be satisfied. Default: None.

  • mean_lt (float, optional) – The threshold for mean of the tensor. When the actual value was less than this threshold, this checking condition would be satisfied. Default: None.

Examples

>>> from mindinsight.debugger import TensorTooSmallCondition
>>> my_condition = TensorTooSmallCondition(abs_mean_lt=0.2)
>>> print(my_condition.name)
TensorTooSmall
property param_names

Return the list of parameter names.

Returns

list[str], the parameter names.

class mindinsight.debugger.TensorUnchangedCondition(rtol=1e-05, atol=1e-08)[source]

Watch condition for tensor value unchanged.

Check allclose function on previous and current tensor. Only when every element in tensor satisfies the equation \(|element\_in\_current\_tensor - element\_in\_previous\_tensor| \leq atol + rtol\times |previous\_tensor|\) , this watchpoint will be hit.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters
  • rtol (float, optional) – The relative tolerance parameter. Default: 1e-5.

  • atol (float, optional) – The absolute tolerance parameter. Default: 1e-8.

Examples

>>> from mindinsight.debugger import TensorUnchangedCondition
>>> my_condition = TensorUnchangedCondition(rtol=1000.0)
>>> print(my_condition.name)
TensorUnchanged
property param_names

Return the list of parameter names.

Returns

list[str], the parameter names.

class mindinsight.debugger.Watchpoint(tensors, condition)[source]

Watchpoint applies condition to specified tensors.

Warning

All APIs in this class are experimental prototypes that are subject to change or delete.

Parameters
Supported Platforms:

Ascend GPU

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> from mindinsight.debugger import TensorTooLargeCondition, Watchpoint
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensor_list = my_run.select_tensors(
...                                     query_string="Conv",
...                                     use_regex=True,
...                                     iterations=[0],
...                                     ranks=[0],
...                                     slots=[0]
...                                     )
>>> watchpoint = Watchpoint(tensors=tensor_list,
...                         condition=TensorTooLargeCondition(abs_mean_gt=0.0))
>>> tensor = list(watchpoint.tensors)[0]
>>> print(tensor.node.name)
Default/network-WithLossCell/_backbone-AlexNet/conv1-Conv2d/Cast-op7
>>> print(watchpoint.condition.name)
TensorTooLarge
property condition

Get the watch condition to apply to tensors.

Returns

ConditionBase, the watch condition to apply to tensors.

property tensors

Get tensors to check.

Returns

Iterable[DebuggerTensor], the tensors to check.

class mindinsight.debugger.WatchpointHit[source]

Watchpoint hit.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Note

  • This class is not meant to be instantiated by user.

  • The instances of this class is immutable.

Supported Platforms:

Ascend GPU

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> from mindinsight.debugger import TensorTooLargeCondition, Watchpoint
>>>
>>> def test_watch_point_hit():
...     my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
...     tensor_list = my_run.select_tensors(
...                                         query_string="Conv",
...                                         use_regex=True,
...                                         iterations=[0],
...                                         ranks=[0],
...                                         slots=[0]
...                                         )
...     watchpoint = Watchpoint(tensors=tensor_list,
...                             condition=TensorTooLargeCondition(abs_mean_gt=0.0))
...     # the check_watchpoints function start a new process needs to be called through the main entry
...     hits = my_run.check_watchpoints(watchpoints=[watchpoint])
...     hit = list(hits)[0]
...     # print(str(hit))
...     # the print result is as follows
...     # Watchpoint TensorTooLarge triggered on tensor:
...     # rank: 0
...     # graph_name: kernel_graph_0
...     # node_name: Default/network-WithLossCell/_backbone-AlexNet/conv1-Conv2d/Cast-op7
...     # slot: 0
...     # iteration: 0
...     # Threshold: {'abs_mean_gt': 0.0}
...     # Hit detail: The setting for watchpoint is abs_mean_gt = 0.0.
...     # The actual value of the tensor is abs_mean_gt = 0.007956420533235841.
...     # print(hit.error_code)
...     # the print result is as follows
...     # 0
...     # print(hit.tensor)
...     # the print result is as follows
...     # rank: 0
...     # graph_name: kernel_graph_0
...     # node_name: Default/network-WithLossCell/_backbone-AlexNet/conv1-Conv2d/Cast-op7
...     # slot: 0
...     # iteration: 0
...     # print(hit.get_hit_detail())
...     # the print result is as follows
...     # The setting for watchpoint is abs_mean_gt = 0.0.
...     # The actual value of the tensor is abs_mean_gt = 0.007956420533235841.
...
>>> if __name__ == "__main__":
...     test_watch_point_hit()
...
property error_code

Get the error code when checking the watchpoint if there is error.

Returns

int, the error number.

property error_msg

Get the error msg when checking the watchpoint if there is error.

Returns

list[str], the error message list.

get_hit_detail()[source]

Get the corresponding watch condition,including the actual values. For example, if the corresponding watch condition is TensorTooLargeCondition(max_gt=None) , watching whether the max value of the tensor greater than 0, the get_hit_detail return a TensorTooLargeCondition object including the max value of the tensor. If error_code is not zero, None will be returned.

Returns

Union[ConditionBase, None], the condition with hit detail, If error_code is not zero, None will be returned.

get_threshold()[source]

Get the condition set by user.

Returns

ConditionBase, the condition with user threshold.

property tensor: mindinsight.debugger.api.debugger_tensor.DebuggerTensor

Get the tensor for this watchpoint hit.

Returns

DebuggerTensor, the triggered tensor.