mindinsight.debugger

Debugger Introduction.

This module provides Python APIs to retrieve the debugger info. The APIs can help users to understand the training process and find the bugs in training script.

class mindinsight.debugger.ConditionBase[source]

Base class for conditions.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Note

If multiple checking parameters is specified for one condition instance, a WatchpointHit happens for the parameters that the tensor triggered for the watchpoint.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> from mindinsight.debugger import (TensorTooLargeCondition,
...                                   Watchpoint)
>>>
>>> def test_condition_base():
...     my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
...     tensors = my_run.select_tensors(query_string="Conv2D-op13")
...     watchpoint = Watchpoint(tensors=tensors,
...                             condition=TensorTooLargeCondition(abs_mean_gt=0.0, max_gt=0.0))
...     hit = list(my_run.check_watchpoints(watchpoints=[watchpoint]))[0]
...     # print(hit.get_hit_detail())
...     # the print result is as follows
...     # The setting for watchpoint is abs_mean_gt = 0.0, max_gt = 0.0.
...     # The actual value of the tensor is abs_mean_gt = 0.06592023578438996, max_gt = 0.449951171875.
...     watchpoint = Watchpoint(tensors=tensors,
...                             condition=TensorTooLargeCondition(abs_mean_gt=0.0, max_gt=1.0))
...     # the check_watchpoints function start a new process needs to be called through the main entry
...     hit = list(my_run.check_watchpoints(watchpoints=[watchpoint]))[0]
...     # print(hit.get_hit_detail())
...     # the print result is as follows
...     # The setting for watchpoint is abs_mean_gt = 0.0.
...     # The actual value of the tensor is abs_mean_gt = 0.06592023578438996.
...
>>> if __name__ == "__main__":
...     test_condition_base()
...

property condition_id

Get the name for the condition Id.

Returns: int, the id of the condition.

property name

Get the name for the condition.

Returns: str, the name of the condition.

property param_dict

Get the parameters list.

Returns: dict, the parameter dict of the condition.

class mindinsight.debugger.DebuggerTensor(node, slot, iteration)[source]

The tensor with specific rank, iteration and debugging info.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters

node (Node) – The node that outputs this tensor.
slot (int) – The slot of the tensor on the node.
iteration (int) – The iteration of the tensor.

Note

Users should not instantiate this class manually.
The instances of this class is immutable.
A DebuggerTensor is always the output tensor of a node.

property iteration

Get iteration of the tensor.

Returns: int, the iteration of the tensor.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensors = list(my_run.select_tensors("conv"))
>>> print(tensors[0].iteration)
0

property node

Get the node that outputs this tensor.

Returns: Node, the node that outputs this tensor.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensors = list(my_run.select_tensors("conv"))
>>> print(tensors[0].node)
rank: 0
graph_name: kernel_graph_0
node_name: conv1.weight

property rank

Get the rank of the tensor.

Returns: int, the rank for this tensor.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensors = list(my_run.select_tensors("conv"))
>>> print(tensors[0].rank)
0

property slot

Get slot.

Returns: int, the slot of the tensor on the node.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensors = list(my_run.select_tensors("conv"))
>>> print(tensors[0].slot)
0

value()[source]

Get the value of the tensor.

Returns: Union[numpy.array, None], The value could be None if failed to find data file in relative iteration.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>>
>>> def test_debugger_tensor():
...     my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
...     tensors = list(my_run.select_tensors("conv"))
...     # the tensors[0].value() maybe start the new process
...     value = tensors[0].value()
...     return value
...
>>> if __name__ == "__main__":
...     test_debugger_tensor()
...

class mindinsight.debugger.DumpAnalyzer(dump_dir, mem_limit=None)[source]

Analyzer to inspect the dump data.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters

dump_dir (str) – The path of the dump folder.
mem_limit (int, optional) – The memory limit for checking watchpoints in MB. Default: None, which means no limit. Optional values: from 2048 MB to 2147483647 MB.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")

check_watchpoints(watchpoints, error_on_no_value=False)[source]

Check the given watch points on specified nodes(if available) on the given iterations(if available) in a batch.

Note

1. For speed, all watchpoints for the iteration should be given at the same time to avoid reading tensors len(watchpoints) times.

2. The check_watchpoints function start a new process needs to be called through the main entry

Parameters

watchpoints (Iterable[Watchpoint]) – The list of watchpoints.
error_on_no_value (bool) – Whether report error code in watchpoint hit when the specified tensor have no value stored in summary_dir. Default: False.

Returns

Iterable[WatchpointHit], the watchpoint hist list is carefully sorted so that the user can see the most import hit on the top of the list. When there are many many watchpoint hits, we will display the list in a designed clear way.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> from mindinsight.debugger import (TensorTooLargeCondition,
...                                    Watchpoint)
>>>
>>> def test_watchpoints():
...     my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
...     tensors = my_run.select_tensors(
...                                         query_string="Conv2D-op13",
...                                         use_regex=True,
...                                         iterations=[0],
...                                         ranks=[0],
...                                         slots=[0]
...                                         )
...     watchpoint = Watchpoint(tensors=tensors,
...                             condition=TensorTooLargeCondition(abs_mean_gt=0.0))
...     # the check_watchpoints function start a new process needs to be called through the main entry
...     hit = list(my_run.check_watchpoints(watchpoints=[watchpoint]))[0]
...     # print(str(hit))
...     # the print result is as follows
...     # Watchpoint TensorTooLarge triggered on tensor:
...     # rank: 0
...     # graph_name: kernel_graph_0
...     # node_name: Default/network-WithLossCell/_backbone-AlexNet/conv2-Conv2d/Conv2D-op13
...     # slot: 0
...     # iteration: 0
...     # Threshold: {'abs_mean_gt': 0.0}
...     # Hit detail: the setting for watchpoint is abs_mean_gt = 0.0.
...     # The actual value of the tensor is abs_mean_gt = 0.06592023578438996.
...
>>> if __name__ == "__main__":
...     test_watchpoints()
...

export_graphs(output_dir=None)[source]

Export the computational graph(s) in xlsx file(s) to the output_dir.

The file(s) will contain the stack info of graph nodes.

Parameters: output_dir (str, optional) – Output directory to save the file. Default: None, which means to use the current working directory.
Returns: str, The path of the generated file.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> res = my_run.export_graphs()

get_input_nodes(node)[source]

Get the input nodes of the given node.

Parameters: node (Node) – The node of which input nodes will be returned.
Returns: Iterable[Node], the input nodes of the given node.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node_list = list(my_run.select_nodes(query_string="Conv2D-op13"))
>>> input_nodes = my_run.get_input_nodes(node_list[0])

get_iterations(ranks=None)[source]

Get available iterations which have data dumped in this run.

Parameters: ranks (Union[int, list[int], None], optional) – The ranks to select. Get available iterations which are under the specified ranks. If None, return iterations of all ranks. Default: None.
Returns: Iterable[int], sorted dumped iteration list.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> iterations = my_run.get_iterations()
>>> print(list(iterations))
[0]

get_output_nodes(node)[source]

Get the nodes that use the output tensors of the given node.

Parameters: node (Node) – The node of which output nodes will be returned.
Returns: Iterable[Node], the output nodes of this node.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node_list = list(my_run.select_nodes(query_string="Conv2D-op13"))
>>> out_nodes = my_run.get_output_nodes(node_list[0])

get_ranks()[source]

Get the available ranks in this run.

Returns: Iterable[int], the list of rank id in current dump directory.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> ranks = my_run.get_ranks()
>>> print(list(ranks))
[0]

list_affected_nodes(tensor)[source]

List the nodes that use given tensor as input.

Affected nodes is defined as the nodes use the given tensor as input. If a node is affected by the given tensor, the node’s output value is likely to change when the given tensor changes.

Parameters: tensor (DebuggerTensor) – The tensor of which affected nodes will be returned.
Returns: Iterable[Node], the affected nodes of the given tensor.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensor_list = list(my_run.select_tensors(query_string="Conv2D-op13"))
>>> affected_nodes = my_run.list_affected_nodes(tensor_list[0])

select_nodes(query_string, use_regex=False, select_by='node_name', ranks=None, case_sensitive=True)[source]

Select nodes.

Select the matched nodes in the computational graph according to the query_string. The nodes can be matched by “node_name” or “code_stack”, see the args document for detail.

Parameters

query_string (str) – Query string. For a node to be selected, the match target field must contains or matches the query string.
use_regex (bool) – Indicates whether query is a regex. Default: False.
select_by (str, optional) – The field to search when selecting nodes. Available values are “node_name”, “code_stack”. “node_name” means to search the name of the nodes in the graph. “code_stack” means the stack info of the node. Default: “node_name”.
ranks (Union[int, list[int], None], optional) – The ranks to select. The selected nodes must exist on the specified ranks. Default: None, which means all ranks will be considered.
case_sensitive (bool, optional) – Whether case-sensitive when selecting tensors. Default: True.

Returns

Iterable[Node], the matched nodes.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> nodes = my_run.select_nodes("Conv2D-op13")

select_tensors(query_string, use_regex=False, select_by='node_name', iterations=None, ranks=None, slots=None, case_sensitive=True)[source]

Select tensors.

Select the matched tensors in the directory according to the query_string. The tensors can be matched by “node_name” or “code_stack”, see the args document for detail.

Parameters

query_string (str) – Query string. For a tensor to be selected, the match target field must contains or matches the query string.
use_regex (bool) – Indicates whether query is a regex. Default: False.
select_by (str, optional) – The field to search when selecting tensors. Available values are “node_name”, “code_stack”. “node_name” means to search the node name of the tensors in the graph. “code_stack” means the stack info of the node that outputs this tensor. Default: “node_name”.
iterations (Union[int, list[int], None], optional) – The iterations to select. Default: None, which means all dumped iterations will be selected.
ranks (Union[int, list[int], None], optional) – The ranks to select. Default: None, which means all ranks will be selected.
slots (list[int], optional) – The slot of the selected tensor. Default: None, which means all slots will be selected.
case_sensitive (bool, optional) – Whether case-sensitive when selecting tensors. Default: True.

Returns

Iterable[DebuggerTensor], the matched tensors.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensors = my_run.select_tensors("Conv2D-op13")

class mindinsight.debugger.Node(node_feature)[source]

Node in the computational graph.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters

node_feature (namedtuple) –

The node feature.

name (str): The node name.
rank (int): The rank id.

stack (iterable[dict]): The format of each item is like:

{
    'file_path': str,
    'line_no': int,
    'code_line': str
}

graph_name (str): The graph name.
root_graph_id (int): The root graph id.

get_input_tensors(iterations=None, slots=None)[source]

Get the input tensors of the node.

Parameters

iterations (Iterable[int]) – The iterations to which the returned tensor should belong. Default: None, which means all available iterations will be considered.
slots (Iterable[int]) – The slots in which the returned tensors should be. Default: None, which means all available slots will be considered.

Returns

Iterable[DebuggerTensor], the input tensors of the node.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node = list(my_run.select_nodes("Conv2D-op13"))[0]
>>> input_tensors = node.get_input_tensors(iterations=[0], slots=[0])

get_output_tensors(iterations=None, slots=None)[source]

Get the output tensors of this node.

Parameters

iterations (Iterable[int]) – The iterations to which the returned tensor should belong.
slots (Iterable[int]) – The slots in which the returned tensors should be.

Returns

Iterable[DebuggerTensor], the output tensors of the node.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node = list(my_run.select_nodes("Conv2D-op13"))[0]
>>> output_tensors = node.get_output_tensors(iterations=[0], slots=[0])

property graph_name: str

Get graph name of current node.

Returns: str, the graph name.

property name

Get the full name of this node.

Returns: str, the full name of the node.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node = list(my_run.select_nodes("conv"))[0]
>>> print(node.name)
conv1.weight

property rank: int

Get rank info.

Returns: int, the rank id to which the node belong.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node = list(my_run.select_nodes("conv"))[0]
>>> print(node.rank)
0

property root_graph_id: int

Get the root graph id to which the dumped tensor of current node will belong.

Returns: int, the root graph id.

property stack

Get stack info.

Returns

{
    'file_path': str,
    'line_no': int,
    'code_line': str
}

Return type

iterable[dict], each item format like

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> node = list(my_run.select_nodes("Conv2D-op13"))[0]
>>> # print(node.stack)
>>> # the print result is as follows
>>> # [{'file_path': '/path', 'line_no': 266, 'code_line': 'output = self.conv2d(x, self.weight)',
>>> # 'has_substack': False},
>>> # {'file_path': '/path', 'line_no': 55, 'code_line': 'x = self.conv2(x), 'has_substack': False}]

class mindinsight.debugger.OperatorOverflowCondition[source]

Operator overflow watchpoint.

Operator overflow whatchpoint checks whether overflow occurs during operator computation. Only Ascend AI processor is supported.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Examples

>>> from mindinsight.debugger import OperatorOverflowCondition
>>> my_condition = OperatorOverflowCondition()
>>> print(my_condition.name)
OperatorOverflow

property param_dict

Get the parameters list.

Returns: dict, the parameter dict of the condition.

property param_names

Return the list of parameter names.

Returns: list[str], the parameter names.

class mindinsight.debugger.TensorAllZeroCondition(zero_percentage_ge)[source]

Tensor all zero watchpoint

When all specified checking conditions were satisfied, this watchpoint would be hit after a check.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters: zero_percentage_ge (float) – The threshold to check if the percentage of zero tensor values are greater than this value.

Examples

>>> from mindinsight.debugger import TensorAllZeroCondition
>>> my_condition = TensorAllZeroCondition(zero_percentage_ge=0.0)
>>> print(my_condition.name)
TensorAllZero

property param_names

Return the list of parameter names.

Returns: list[str], the parameter names.

class mindinsight.debugger.TensorChangeAboveThresholdCondition(abs_mean_update_ratio_gt, epsilon=1e-09)[source]

Tensor change above threshold watchpoint.

When all specified checking conditions were satisfied, this watchpoint would be hit after a check. (abs_mean(current_tensor - previous_tensor) > epsilon + mean_update_ratio_gt * abs_mean(previous_tensor))

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters

abs_mean_update_ratio_gt (float) – The threshold value for mean update ratio, if the mean update ratio is greater than this value the watchpoint will be triggered.
epsilon (float, optional) – Epsilon value. Default: 1e-9.

Examples

>>> from mindinsight.debugger import TensorChangeAboveThresholdCondition
>>> my_condition = TensorChangeAboveThresholdCondition(abs_mean_update_ratio_gt=0.0)
>>> print(my_condition.name)
TensorChangeAboveThreshold

property param_names

Return the list of parameter names.

Returns: list[str], the parameter names.

class mindinsight.debugger.TensorChangeBelowThresholdCondition(abs_mean_update_ratio_lt, epsilon=1e-09)[source]

Tensor change below threshold watchpoint.

When all specified checking conditions were satisfied, this watchpoint would be hit after a check. (abs_mean(current_tensor - previous_tensor) < epsilon + mean_update_ratio_lt * abs_mean(previous_tensor))

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters

abs_mean_update_ratio_lt (float) – The threshold value for mean update ration. If the mean update ratio is less that this value the watchpoint will be triggered.
epsilon (float, optional) – Epsilon value. Default: 1e-9.

Examples

>>> from mindinsight.debugger import TensorChangeBelowThresholdCondition
>>> my_condition = TensorChangeBelowThresholdCondition(abs_mean_update_ratio_lt=2.0)
>>> print(my_condition.name)
TensorChangeBelowThreshold

property param_names

Return the list of parameter names.

Returns: list[str], the parameter names.

class mindinsight.debugger.TensorOverflowCondition[source]

Tensor overflow watchpoint.

Tensor overflow whatchpoint checks for inf and nan tensors.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Examples

>>> from mindinsight.debugger import TensorOverflowCondition
>>> my_condition = TensorOverflowCondition()
>>> print(my_condition.name)
TensorOverflow

property param_dict

Get the parameters list.

Returns: dict, the parameter dict of the condition.

property param_names

Return the list of parameter names.

Returns: list[str], the parameter names.

class mindinsight.debugger.TensorRangeCondition(range_start_inclusive=None, range_end_inclusive=None, range_percentage_lt=None, range_percentage_gt=None, max_min_lt=None, max_min_gt=None)[source]

Tensor range watchpoint.

Set a threshold to check the tensor value range. There are four options: range_percentage_lt, range_percentage_gt, max_min_lt and max_min_gt. At least one of the four options should be specified. If the threshold is set to one of the first two options, then both range_start_inclusive and range_end_inclusive must be set. When all specified checking conditions were satisfied, this watchpoint would be hit after a check.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters

range_percentage_lt (float, optional) – The threshold for the percentage of the tensor in the range. The checking condition will be satisfied when the percentage of the tensor in the specified range is less than this value.
range_percentage_gt (float, optional) – The threshold for the percentage of the tensor in the range. The checking condition will be satisfied when the percentage of the tensor in the specified range is greater than this value.
max_min_lt (float, optional) – Threshold for the difference of max and min of a tensor less than this value.
max_min_gt (float, optional) – Threshold for the difference of max and min of a tensor greater than this value.
range_start_inclusive (float, optional) – The start of the range.
range_end_inclusive (float, optional) – The end of the range.

Examples

>>> from mindinsight.debugger import TensorRangeCondition
>>> my_condition = TensorRangeCondition(max_min_gt=0.05)
>>> print(my_condition.name)
TensorRange

property param_names

Return the list of parameter names.

Returns: list[str], the parameter names.

class mindinsight.debugger.TensorTooLargeCondition(abs_mean_gt=None, max_gt=None, min_gt=None, mean_gt=None)[source]

Tensor too large watchpoint. At least one parameter should be specified.

When all specified checking conditions were satisfied, this watchpoint would be hit after a check.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters

abs_mean_gt (float, optional) – The threshold for mean of the absolute value of the tensor. When the actual value was greater than this threshold, this checking condition would be satisfied.
max_gt (float, optional) – The threshold for maximum of the tensor. When the actual value was greater than this threshold, this checking condition would be satisfied.
min_gt (float, optional) – The threshold for minimum of the tensor. When the actual value was greater than this threshold, this checking condition would be satisfied.
mean_gt (float, optional) – The threshold for mean of the tensor. When the actual value was greater than this threshold, this checking condition would be satisfied.

Examples

>>> from mindinsight.debugger import TensorTooLargeCondition
>>> my_condition = TensorTooLargeCondition(abs_mean_gt=0.0)
>>> print(my_condition.name)
TensorTooLarge

property param_names

Return the list of parameter names.

Returns: list[str], the parameter names.

class mindinsight.debugger.TensorTooSmallCondition(abs_mean_lt=None, max_lt=None, min_lt=None, mean_lt=None)[source]

Tensor too small watchpoint. At least one parameter should be specified.

When all specified checking conditions were satisfied, this watchpoint would be hit after a check.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters

abs_mean_lt (float, optional) – The threshold for mean of the absolute value of the tensor. When the actual value was less than this threshold, this checking condition would be satisfied.
max_lt (float, optional) – The threshold for maximum of the tensor. When the actual value was less than this threshold, this checking condition would be satisfied.
min_lt (float, optional) – The threshold for minimum of the tensor. When the actual value was less than this threshold, this checking condition would be satisfied.
mean_lt (float, optional) – The threshold for mean of the tensor. When the actual value was less than this threshold, this checking condition would be satisfied.

Examples

>>> from mindinsight.debugger import TensorTooSmallCondition
>>> my_condition = TensorTooSmallCondition(abs_mean_lt=0.2)
>>> print(my_condition.name)
TensorTooSmall

property param_names

Return the list of parameter names.

Returns: list[str], the parameter names.

class mindinsight.debugger.TensorUnchangedCondition(rtol=1e-05, atol=1e-08)[source]

Tensor unchanged condition watchpoint.

When all specified checking conditions were satisfied, this watchpoint would be hit after a check. Checks allclose function on previous and current tensor. (abs_mean(current_tensor - previous_tensor) <= (atol + rtol * abs_mean(previous_tensor)))

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Parameters

rtol (float, optional) – The relative tolerance parameter. Default: 1e-5.
atol (float, optional) – The absolute tolerance parameter. Default: 1e-8.

Examples

>>> from mindinsight.debugger import TensorUnchangedCondition
>>> my_condition = TensorUnchangedCondition(rtol=1000.0)
>>> print(my_condition.name)
TensorUnchanged

property param_names

Return the list of parameter names.

Returns: list[str], the parameter names.

class mindinsight.debugger.Watchpoint(tensors, condition)[source]

Watchpoint applies condition to specified tensors.

Warning

All APIs in this class are experimental prototypes that are subject to change or delete.

Parameters

tensors (Iterable[DebuggerTensor]) – The tensors to check.
condition (ConditionBase) – The condition to apply to tensors.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> from mindinsight.debugger import TensorTooLargeCondition, Watchpoint
>>> my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
>>> tensor_list = my_run.select_tensors(
...                                     query_string="Conv",
...                                     use_regex=True,
...                                     iterations=[0],
...                                     ranks=[0],
...                                     slots=[0]
...                                     )
>>> watchpoint = Watchpoint(tensors=tensor_list,
...                         condition=TensorTooLargeCondition(abs_mean_gt=0.0))
>>> tensor = list(watchpoint.tensors)[0]
>>> print(tensor.node.name)
Default/network-WithLossCell/_backbone-AlexNet/conv1-Conv2d/Cast-op7
>>> print(watchpoint.condition.name)
TensorTooLarge

property condition

Get the condition to apply to tensors.

Returns: ConditionBase, the condition to apply to tensors.

property tensors

Get tensors to check.

Returns: Iterable[DebuggerTensor]), the tensors to check.

class mindinsight.debugger.WatchpointHit[source]

Watchpoint hit.

Warning

All APIs in this class are experimental prototypes that are subject to change or deletion.

Note

This class is not meant to be instantiated by user.
The instances of this class is immutable.

Examples

>>> from mindinsight.debugger import DumpAnalyzer
>>> from mindinsight.debugger import TensorTooLargeCondition, Watchpoint
>>>
>>> def test_watch_point_hit():
...     my_run = DumpAnalyzer(dump_dir="/path/to/your/dump_dir_with_dump_data")
...     tensor_list = my_run.select_tensors(
...                                         query_string="Conv",
...                                         use_regex=True,
...                                         iterations=[0],
...                                         ranks=[0],
...                                         slots=[0]
...                                         )
...     watchpoint = Watchpoint(tensors=tensor_list,
...                             condition=TensorTooLargeCondition(abs_mean_gt=0.0))
...     # the check_watchpoints function start a new process needs to be called through the main entry
...     hits = my_run.check_watchpoints(watchpoints=[watchpoint])
...     hit = list(hits)[0]
...     # print(str(hit))
...     # the print result is as follows
...     # Watchpoint TensorTooLarge triggered on tensor:
...     # rank: 0
...     # graph_name: kernel_graph_0
...     # node_name: Default/network-WithLossCell/_backbone-AlexNet/conv1-Conv2d/Cast-op7
...     # slot: 0
...     # iteration: 0
...     # Threshold: {'abs_mean_gt': 0.0}
...     # Hit detail: The setting for watchpoint is abs_mean_gt = 0.0.
...     # The actual value of the tensor is abs_mean_gt = 0.007956420533235841.
...     # print(hit.error_code)
...     # the print result is as follows
...     # 0
...     # print(hit.tensor)
...     # the print result is as follows
...     # rank: 0
...     # graph_name: kernel_graph_0
...     # node_name: Default/network-WithLossCell/_backbone-AlexNet/conv1-Conv2d/Cast-op7
...     # slot: 0
...     # iteration: 0
...     # print(hit.get_hit_detail())
...     # the print result is as follows
...     # The setting for watchpoint is abs_mean_gt = 0.0.
...     # The actual value of the tensor is abs_mean_gt = 0.007956420533235841.
...
>>> if __name__ == "__main__":
...     test_watch_point_hit()
...

property error_code

Get the error code when checking the watchpoint if there is error.

Returns: int, the error number.

property error_msg

Get the error msg when checking the watchpoint if there is error.

Returns: list[str], the error message list.

get_hit_detail()[source]

Get the actual values for the thresholds in the watchpoint. If error_code is not zero, None will be returned.

Returns: Union[ConditionBase, None], the condition with hit detail, If error_code is not zero, None will be returned, see info with str(ConditionBase).

get_threshold()[source]

Get the condition set by user.

Returns: ConditionBase, the condition with user threshold, see info with str(ConditionBase).

property tensor: mindinsight.debugger.api.debugger_tensor.DebuggerTensor

Get the tensor for this watchpoint hit.

Returns: DebuggerTensor, the triggered tensor.