mindarmour.detectors

This module includes detector methods on distinguishing adversarial examples from benign examples.

class mindarmour.detectors.DivergenceBasedDetector(auto_encoder, model, option='jsd', t=1, bounds=(0.0, 1.0))[source]

This class implement a divergence-based detector.

Reference: MagNet: a Two-Pronged Defense against Adversarial Examples, by Dongyu Meng and Hao Chen, at CCS 2017.

Parameters

auto_encoder (Model) – Encoder model.
model (Model) – Targeted model.
option (str) – Method used to calculate Divergence. Default: “jsd”.
t (int) – Temperature used to overcome numerical problem. Default: 1.
bounds (tuple) – Upper and lower bounds of data. In form of (clip_min, clip_max). Default: (0.0, 1.0).

detect_diff(inputs)[source]

Detect the distance between original samples and reconstructed samples.

The distance is calculated by JSD.

Parameters: inputs (numpy.ndarray) – Input samples.
Returns: float, the distance.
Raises: NotImplementedError – If the param option is not supported.

class mindarmour.detectors.EnsembleDetector(detectors, policy='vote')[source]

Ensemble detector.

Parameters

detectors (Union[tuple, list]) – List of detector methods.
policy (str) – Decision policy, could be ‘vote’, ‘all’ or ‘any’. Default: ‘vote’

detect(inputs)[source]

Detect adversarial examples from input samples.

Parameters: inputs (numpy.ndarray) – Input samples.
Returns: list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.
Raises: ValueError – If policy is not supported.

detect_diff(inputs)[source]

This method is not available in this class.

Parameters: inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
Raises: NotImplementedError – This function is not available in ensemble.

fit(inputs, labels=None)[source]

Fit detector like a machine learning model. This method is not available in this class.

Parameters

inputs (numpy.ndarray) – Data to calculate the threshold.
labels (numpy.ndarray) – Labels of data.

Raises

NotImplementedError – This function is not available in ensemble.

transform(inputs)[source]

Filter adversarial noises in input samples. This method is not available in this class.

Raises: NotImplementedError – This function is not available in ensemble.

class mindarmour.detectors.ErrorBasedDetector(auto_encoder, false_positive_rate=0.01, bounds=(0.0, 1.0))[source]

The detector reconstructs input samples, measures reconstruction errors and rejects samples with large reconstruction errors.

Reference: MagNet: a Two-Pronged Defense against Adversarial Examples, by Dongyu Meng and Hao Chen, at CCS 2017.

Parameters

auto_encoder (Model) – An (trained) auto encoder which represents the input by reduced encoding.
false_positive_rate (float) – Detector’s false positive rate. Default: 0.01.
bounds (tuple) – (clip_min, clip_max). Default: (0.0, 1.0).

detect(inputs)[source]

Detect if input samples are adversarial or not.

Parameters: inputs (numpy.ndarray) – Suspicious samples to be judged.
Returns: list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

detect_diff(inputs)[source]

Detect the distance between the original samples and reconstructed samples.

Parameters: inputs (numpy.ndarray) – Input samples.
Returns: float, the distance between reconstructed and original samples.

fit(inputs, labels=None)[source]

Find a threshold for a given dataset to distinguish adversarial examples.

Parameters

inputs (numpy.ndarray) – Input samples.
labels (numpy.ndarray) – Labels of input samples. Default: None.

Returns

float, threshold to distinguish adversarial samples from benign ones.

set_threshold(threshold)[source]

Set the parameters threshold.

Parameters: threshold (float) – Detection threshold. Default: None.

transform(inputs)[source]

Reconstruct input samples.

Parameters: inputs (numpy.ndarray) – Input samples.
Returns: numpy.ndarray, reconstructed images.

class mindarmour.detectors.RegionBasedDetector(model, number_points=10, initial_radius=0.0, max_radius=1.0, search_step=0.01, degrade_limit=0.0, sparse=False)[source]

This class implement a region-based detector.

Reference: Mitigating evasion attacks to deep neural networks via region-based classification

Parameters

model (Model) – Target model.
number_points (int) – The number of samples generate from the hyper cube of original sample. Default: 10.
initial_radius (float) – Initial radius of hyper cube. Default: 0.0.
max_radius (float) – Maximum radius of hyper cube. Default: 1.0.
search_step (float) – Incremental during search of radius. Default: 0.01.
degrade_limit (float) – Acceptable decrease of classification accuracy. Default: 0.0.
sparse (bool) – If True, input labels are sparse-encoded. If False, input labels are one-hot-encoded. Default: False.

Examples

>>> detector = RegionBasedDetector(model)
>>> detector.fit(Tensor(ori), Tensor(labels))
>>> adv_ids = detector.detect(Tensor(adv))

detect(inputs)[source]

Tell whether input samples are adversarial or not.

Parameters: inputs (numpy.ndarray) – Suspicious samples to be judged.
Returns: list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

detect_diff(inputs)[source]

Return raw prediction results and region-based prediction results.

Parameters: inputs (numpy.ndarray) – Input samples.
Returns: numpy.ndarray, raw prediction results and region-based prediction results of input samples.

fit(inputs, labels=None)[source]

Train detector to decide the best radius.

Parameters

inputs (numpy.ndarray) – Benign samples.
labels (numpy.ndarray) – Ground truth labels of the input samples. Default:None.

Returns

float, the best radius.

set_radius(radius)[source]: Set radius.

transform(inputs)[source]

Generate hyper cube for input samples.

Parameters: inputs (numpy.ndarray) – Input samples.
Returns: numpy.ndarray, hyper cube corresponds to every sample.

class mindarmour.detectors.SimilarityDetector(trans_model, max_k_neighbor=1000, chunk_size=1000, max_buffer_size=10000, tuning=False, fpr=0.001)[source]

The detector measures similarity among adjacent queries and rejects queries which are remarkably similar to previous queries.

Reference: Stateful Detection of Black-Box Adversarial Attacks by Steven Chen, Nicholas Carlini, and David Wagner. at arxiv 2019

Parameters

trans_model (Model) – A MindSpore model to encode input data into lower dimension vector.
max_k_neighbor (int) – The maximum number of the nearest neighbors. Default: 1000.
chunk_size (int) – Buffer size. Default: 1000.
max_buffer_size (int) – Maximum buffer size. Default: 10000.
tuning (bool) – Calculate the average distance for the nearest k neighbours, if tuning is true, k=K. If False k=1,…,K. Default: False.
fpr (float) – False positive ratio on legitimate query sequences. Default: 0.001

Examples

>>> detector = SimilarityDetector(model)
>>> detector.fit(Tensor(ori), Tensor(labels))
>>> adv_ids = detector.detect(Tensor(adv))

clear_buffer()[source]: Clear the buffer memory.

detect(inputs)[source]

Process queries to detect black-box attack.

Parameters: inputs (numpy.ndarray) – Query sequence.
Raises: ValueError – The parameters of threshold or num_of_neighbors is not available.

detect_diff(inputs)[source]

Detect adversarial samples from input samples, like the predict_proba function in common machine learning model.

Parameters: inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.
Raises: NotImplementedError – This function is not available in class SimilarityDetector.

fit(inputs, labels=None)[source]

Process input training data to calculate the threshold. A proper threshold should make sure the false positive rate is under a given value.

Parameters

inputs (numpy.ndarray) – Training data to calculate the threshold.
labels (numpy.ndarray) – Labels of training data.

Returns

list[int], number of the nearest neighbors.
list[float], calculated thresholds for different K.

Raises

ValueError – The number of training data is less than max_k_neighbor!

get_detected_queries()[source]

Get the indexes of detected queries.

Returns: list[int], sequence number of detected malicious queries.

get_detection_interval()[source]

Get the interval between adjacent detections.

Returns: list[int], number of queries between adjacent detections.

set_threshold(num_of_neighbors, threshold)[source]

Set the parameters num_of_neighbors and threshold.

Parameters

num_of_neighbors (int) – Number of the nearest neighbors.
threshold (float) – Detection threshold. Default: None.

transform(inputs)[source]

Filter adversarial noises in input samples.

Raises: NotImplementedError – This function is not available in class SimilarityDetector.

class mindarmour.detectors.SpatialSmoothing(model, ksize=3, is_local_smooth=True, metric='l1', false_positive_ratio=0.05)[source]

Detect method based on spatial smoothing.

Parameters

model (Model) – Target model.
ksize (int) – Smooth window size. Default: 3.
is_local_smooth (bool) – If True, trigger local smooth. If False, none local smooth. Default: True.
metric (str) – Distance method. Default: ‘l1’.
false_positive_ratio (float) – False positive rate over benign samples. Default: 0.05.

Examples

>>> detector = SpatialSmoothing(model)
>>> detector.fit(Tensor(ori), Tensor(labels))
>>> adv_ids = detector.detect(Tensor(adv))

detect(inputs)[source]

Detect if an input sample is an adversarial example.

Parameters: inputs (numpy.ndarray) – Suspicious samples to be judged.
Returns: list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

detect_diff(inputs)[source]

Return the raw distance value (before apply the threshold) between the input sample and its smoothed counterpart.

Parameters: inputs (numpy.ndarray) – Suspicious samples to be judged.
Returns: float, distance.

fit(inputs, labels=None)[source]

Train detector to decide the threshold. The proper threshold make sure the actual false positive rate over benign sample is less than the given value.

Parameters

inputs (numpy.ndarray) – Benign samples.
labels (numpy.ndarray) – Default None.

Returns

float, threshold, distance larger than which is reported as positive, i.e. adversarial.

set_threshold(threshold)[source]

Set the parameters threshold.

Parameters: threshold (float) – Detection threshold. Default: None.