mindarmour.detectors

This module includes detector methods on distinguishing adversarial examples from benign examples.

class mindarmour.detectors.ErrorBasedDetector(auto_encoder, false_positive_rate=0.01, bounds=(0.0, 1.0))[source]

The detector reconstructs input samples, measures reconstruction errors and rejects samples with large reconstruction errors.

Reference: MagNet: a Two-Pronged Defense against Adversarial Examples, by Dongyu Meng and Hao Chen, at CCS 2017.

Parameters
  • auto_encoder (Model) – An (trained) auto encoder which represents the input by reduced encoding.

  • false_positive_rate (float) – Detector’s false positive rate. Default: 0.01.

  • bounds (tuple) – (clip_min, clip_max). Default: (0.0, 1.0).

detect(inputs)[source]

Detect if input samples are adversarial or not.

Parameters

inputs (numpy.ndarray) – Suspicious samples to be judged.

Returns

list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

detect_diff(inputs)[source]

Detect the distance between the original samples and reconstructed samples.

Parameters

inputs (numpy.ndarray) – Input samples.

Returns

float, the distance between reconstructed and original samples.

fit(inputs, labels=None)[source]

Find a threshold for a given dataset to distinguish adversarial examples.

Parameters
Returns

float, threshold to distinguish adversarial samples from benign ones.

set_threshold(threshold)[source]

Set the parameters threshold.

Parameters

threshold (float) – Detection threshold. Default: None.

transform(inputs)[source]

Reconstruct input samples.

Parameters

inputs (numpy.ndarray) – Input samples.

Returns

numpy.ndarray, reconstructed images.

class mindarmour.detectors.DivergenceBasedDetector(auto_encoder, model, option='jsd', t=1, bounds=(0.0, 1.0))[source]

This class implement a divergence-based detector.

Reference: MagNet: a Two-Pronged Defense against Adversarial Examples, by Dongyu Meng and Hao Chen, at CCS 2017.

Parameters
  • auto_encoder (Model) – Encoder model.

  • model (Model) – Targeted model.

  • option (str) – Method used to calculate Divergence. Default: “jsd”.

  • t (int) – Temperature used to overcome numerical problem. Default: 1.

  • bounds (tuple) – Upper and lower bounds of data. In form of (clip_min, clip_max). Default: (0.0, 1.0).

detect_diff(inputs)[source]

Detect the distance between original samples and reconstructed samples.

The distance is calculated by JSD.

Parameters

inputs (numpy.ndarray) – Input samples.

Returns

float, the distance.

Raises

NotImplementedError – If the param option is not supported.

class mindarmour.detectors.RegionBasedDetector(model, number_points=10, initial_radius=0.0, max_radius=1.0, search_step=0.01, degrade_limit=0.0, sparse=False)[source]

This class implement a region-based detector.

Reference: Mitigating evasion attacks to deep neural networks via region-based classification

Parameters
  • model (Model) – Target model.

  • number_points (int) – The number of samples generate from the hyper cube of original sample. Default: 10.

  • initial_radius (float) – Initial radius of hyper cube. Default: 0.0.

  • max_radius (float) – Maximum radius of hyper cube. Default: 1.0.

  • search_step (float) – Incremental during search of radius. Default: 0.01.

  • degrade_limit (float) – Acceptable decrease of classification accuracy. Default: 0.0.

  • sparse (bool) – If True, input labels are sparse-encoded. If False, input labels are one-hot-encoded. Default: False.

Examples

>>> detector = RegionBasedDetector(model)
>>> detector.fit(Tensor(ori), Tensor(labels))
>>> adv_ids = detector.detect(Tensor(adv))
detect(inputs)[source]

Tell whether input samples are adversarial or not.

Parameters

inputs (numpy.ndarray) – Suspicious samples to be judged.

Returns

list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

detect_diff(inputs)[source]

Return raw prediction results and region-based prediction results.

Parameters

inputs (numpy.ndarray) – Input samples.

Returns

numpy.ndarray, raw prediction results and region-based prediction results of input samples.

fit(inputs, labels=None)[source]

Train detector to decide the best radius.

Parameters
Returns

float, the best radius.

set_radius(radius)[source]

Set radius.

transform(inputs)[source]

Generate hyper cube for input samples.

Parameters

inputs (numpy.ndarray) – Input samples.

Returns

numpy.ndarray, hyper cube corresponds to every sample.

class mindarmour.detectors.SpatialSmoothing(model, ksize=3, is_local_smooth=True, metric='l1', false_positive_ratio=0.05)[source]

Detect method based on spatial smoothing.

Parameters
  • model (Model) – Target model.

  • ksize (int) – Smooth window size. Default: 3.

  • is_local_smooth (bool) – If True, trigger local smooth. If False, none local smooth. Default: True.

  • metric (str) – Distance method. Default: ‘l1’.

  • false_positive_ratio (float) – False positive rate over benign samples. Default: 0.05.

Examples

>>> detector = SpatialSmoothing(model)
>>> detector.fit(Tensor(ori), Tensor(labels))
>>> adv_ids = detector.detect(Tensor(adv))
detect(inputs)[source]

Detect if an input sample is an adversarial example.

Parameters

inputs (numpy.ndarray) – Suspicious samples to be judged.

Returns

list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

detect_diff(inputs)[source]

Return the raw distance value (before apply the threshold) between the input sample and its smoothed counterpart.

Parameters

inputs (numpy.ndarray) – Suspicious samples to be judged.

Returns

float, distance.

fit(inputs, labels=None)[source]

Train detector to decide the threshold. The proper threshold make sure the actual false positive rate over benign sample is less than the given value.

Parameters
Returns

float, threshold, distance larger than which is reported as positive, i.e. adversarial.

set_threshold(threshold)[source]

Set the parameters threshold.

Parameters

threshold (float) – Detection threshold. Default: None.

class mindarmour.detectors.EnsembleDetector(detectors, policy='vote')[source]

Ensemble detector.

Parameters
  • detectors (Union[tuple, list]) – List of detector methods.

  • policy (str) – Decision policy, could be ‘vote’, ‘all’ or ‘any’. Default: ‘vote’

detect(inputs)[source]

Detect adversarial examples from input samples.

Parameters

inputs (numpy.ndarray) – Input samples.

Returns

list[int], whether a sample is adversarial. if res[i]=1, then the input sample with index i is adversarial.

Raises

ValueError – If policy is not supported.

detect_diff(inputs)[source]

This method is not available in this class.

Parameters

inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.

Raises

NotImplementedError – This function is not available in ensemble.

fit(inputs, labels=None)[source]

Fit detector like a machine learning model. This method is not available in this class.

Parameters
Raises

NotImplementedError – This function is not available in ensemble.

transform(inputs)[source]

Filter adversarial noises in input samples. This method is not available in this class.

Raises

NotImplementedError – This function is not available in ensemble.

class mindarmour.detectors.SimilarityDetector(trans_model, max_k_neighbor=1000, chunk_size=1000, max_buffer_size=10000, tuning=False, fpr=0.001)[source]

The detector measures similarity among adjacent queries and rejects queries which are remarkably similar to previous queries.

Reference: Stateful Detection of Black-Box Adversarial Attacks by Steven Chen, Nicholas Carlini, and David Wagner. at arxiv 2019

Parameters
  • trans_model (Model) – A MindSpore model to encode input data into lower dimension vector.

  • max_k_neighbor (int) – The maximum number of the nearest neighbors. Default: 1000.

  • chunk_size (int) – Buffer size. Default: 1000.

  • max_buffer_size (int) – Maximum buffer size. Default: 10000.

  • tuning (bool) – Calculate the average distance for the nearest k neighbours, if tuning is true, k=K. If False k=1,…,K. Default: False.

  • fpr (float) – False positive ratio on legitimate query sequences. Default: 0.001

Examples

>>> detector = SimilarityDetector(model)
>>> detector.fit(Tensor(ori), Tensor(labels))
>>> adv_ids = detector.detect(Tensor(adv))
clear_buffer()[source]

Clear the buffer memory.

detect(inputs)[source]

Process queries to detect black-box attack.

Parameters

inputs (numpy.ndarray) – Query sequence.

Raises

ValueError – The parameters of threshold or num_of_neighbors is not available.

detect_diff(inputs)[source]

Detect adversarial samples from input samples, like the predict_proba function in common machine learning model.

Parameters

inputs (Union[numpy.ndarray, list, tuple]) – Data been used as references to create adversarial examples.

Raises

NotImplementedError – This function is not available in class SimilarityDetector.

fit(inputs, labels=None)[source]

Process input training data to calculate the threshold. A proper threshold should make sure the false positive rate is under a given value.

Parameters
Returns

  • list[int], number of the nearest neighbors.

  • list[float], calculated thresholds for different K.

Raises

ValueError – The number of training data is less than max_k_neighbor!

get_detected_queries()[source]

Get the indexes of detected queries.

Returns

list[int], sequence number of detected malicious queries.

get_detection_interval()[source]

Get the interval between adjacent detections.

Returns

list[int], number of queries between adjacent detections.

set_threshold(num_of_neighbors, threshold)[source]

Set the parameters num_of_neighbors and threshold.

Parameters
  • num_of_neighbors (int) – Number of the nearest neighbors.

  • threshold (float) – Detection threshold. Default: None.

transform(inputs)[source]

Filter adversarial noises in input samples.

Raises

NotImplementedError – This function is not available in class SimilarityDetector.