mindarmour.attacks

This module includes classical black-box and white-box attack algorithms in making adversarial examples.

class mindarmour.attacks.BasicIterativeMethod(network, eps=0.3, eps_iter=0.1, bounds=(0.0, 1.0), is_targeted=False, nb_iter=5, loss_fn=None)[source]

The Basic Iterative Method attack, an iterative FGSM method to generate adversarial examples.

References: A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the physical world,” in ICLR, 2017

Parameters

network (Cell) – Target model.
eps (float) – Proportion of adversarial perturbation generated by the attack to data range. Default: 0.3.
eps_iter (float) – Proportion of single-step adversarial perturbation generated by the attack to data range. Default: 0.1.
bounds (tuple) – Upper and lower bounds of data, indicating the data range. In form of (clip_min, clip_max). Default: (0.0, 1.0).
is_targeted (bool) – If True, targeted attack. If False, untargeted attack. Default: False.
nb_iter (int) – Number of iteration. Default: 5.
loss_fn (Loss) – Loss function for optimization.
attack (class) – The single step gradient method of each iteration. In this class, FGSM is used.

Examples

>>> attack = BasicIterativeMethod(network)

generate(inputs, labels)[source]

Simple iterative FGSM method to generate adversarial examples.

Parameters

inputs (numpy.ndarray) – Benign input samples used as references to create adversarial examples.
labels (numpy.ndarray) – Original/target labels.

Returns

numpy.ndarray, generated adversarial examples.

Examples

>>> adv_x = attack.generate([[0.3, 0.2, 0.6],
>>>                          [0.3, 0.2, 0.4]],
>>>                         [[0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
>>>                          [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]])

class mindarmour.attacks.CarliniWagnerL2Attack(network, num_classes, box_min=0.0, box_max=1.0, bin_search_steps=5, max_iterations=1000, confidence=0, learning_rate=0.005, initial_const=0.01, abort_early_check_ratio=0.05, targeted=False, fast=True, abort_early=True, sparse=True)[source]

The Carlini & Wagner attack using L2 norm.

References: Nicholas Carlini, David Wagner: “Towards Evaluating the Robustness of Neural Networks”

Parameters

network (Cell) – Target model.
num_classes (int) – Number of labels of model output, which should be greater than zero.
box_min (float) – Lower bound of input of the target model. Default: 0.
box_max (float) – Upper bound of input of the target model. Default: 1.0.
bin_search_steps (int) – The number of steps for the binary search used to find the optimal trade-off constant between distance and confidence. Default: 5.
max_iterations (int) – The maximum number of iterations, which should be greater than zero. Default: 1000.
confidence (float) – Confidence of the output of adversarial examples. Default: 0.
learning_rate (float) – The learning rate for the attack algorithm. Default: 5e-3.
initial_const (float) – The initial trade-off constant to use to balance the relative importance of perturbation norm and confidence difference. Default: 1e-2.
abort_early_check_ratio (float) – Check loss progress every ratio of all iteration. Default: 5e-2.
targeted (bool) – If True, targeted attack. If False, untargeted attack. Default: False.
fast (bool) – If True, return the first found adversarial example. If False, return the adversarial samples with smaller perturbations. Default: True.
abort_early (bool) – If True, Adam will be aborted if the loss hasn’t decreased for some time. If False, Adam will continue work until the max iterations is arrived. Default: True.
sparse (bool) – If True, input labels are sparse-coded. If False, input labels are onehot-coded. Default: True.

Examples

>>> attack = CarliniWagnerL2Attack(network)

generate(inputs, labels)[source]

Generate adversarial examples based on input data and targeted labels.

Parameters

inputs (numpy.ndarray) – Input samples.
labels (numpy.ndarray) – The ground truth label of input samples or target labels.

Returns

numpy.ndarray, generated adversarial examples.

Examples

>>> advs = attack.generate([[0.1, 0.2, 0.6], [0.3, 0, 0.4]], [1, 2]]

class mindarmour.attacks.DeepFool(network, num_classes, max_iters=50, overshoot=0.02, norm_level=2, bounds=None, sparse=True)[source]

DeepFool is an untargeted & iterative attack achieved by moving the benign sample to the nearest classification boundary and crossing the boundary.

Reference: DeepFool: a simple and accurate method to fool deep neural networks

Parameters

network (Cell) – Target model.
num_classes (int) – Number of labels of model output, which should be greater than zero.
max_iters (int) – Max iterations, which should be greater than zero. Default: 50.
overshoot (float) – Overshoot parameter. Default: 0.02.
norm_level (int) – Order of the vector norm. Possible values: np.inf or 2. Default: 2.
bounds (tuple) – Upper and lower bounds of data range. In form of (clip_min, clip_max). Default: None.
sparse (bool) – If True, input labels are sparse-coded. If False, input labels are onehot-coded. Default: True.

Examples

>>> attack = DeepFool(network)

generate(inputs, labels)[source]

Generate adversarial examples based on input samples and original labels.

Parameters

inputs (numpy.ndarray) – Input samples.
labels (numpy.ndarray) – Original labels.

Returns

numpy.ndarray, adversarial examples.

Raises

NotImplementedError – If norm_level is not in [2, np.inf, ‘2’, ‘inf’].

Examples

>>> advs = generate([[0.2, 0.3, 0.4], [0.3, 0.4, 0.5]], [1, 2])

class mindarmour.attacks.FastGradientMethod(network, eps=0.07, alpha=None, bounds=(0.0, 1.0), norm_level=2, is_targeted=False, loss_fn=None)[source]

This attack is a one-step attack based on gradients calculation, and the norm of perturbations includes L1, L2 and Linf.

References: I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in ICLR, 2015.

Parameters

network (Cell) – Target model.
eps (float) – Proportion of single-step adversarial perturbation generated by the attack to data range. Default: 0.07.
alpha (float) – Proportion of single-step random perturbation to data range. Default: None.
bounds (tuple) – Upper and lower bounds of data, indicating the data range. In form of (clip_min, clip_max). Default: (0.0, 1.0).
norm_level (Union[int, numpy.inf]) – Order of the norm. Possible values: np.inf, 1 or 2. Default: 2.
is_targeted (bool) – If True, targeted attack. If False, untargeted attack. Default: False.
loss_fn (Loss) – Loss function for optimization.

Examples

>>> attack = FastGradientMethod(network)

class mindarmour.attacks.FastGradientSignMethod(network, eps=0.07, alpha=None, bounds=(0.0, 1.0), is_targeted=False, loss_fn=None)[source]

Use the sign instead of the value of the gradient to the input. This attack is often referred to as Fast Gradient Sign Method and was introduced previously.

References: Ian J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in ICLR, 2015

Parameters

network (Cell) – Target model.
eps (float) – Proportion of single-step adversarial perturbation generated by the attack to data range. Default: 0.07.
alpha (float) – Proportion of single-step random perturbation to data range. Default: None.
bounds (tuple) – Upper and lower bounds of data, indicating the data range. In form of (clip_min, clip_max). Default: (0.0, 1.0).
is_targeted (bool) – If True, targeted attack. If False, untargeted attack. Default: False.
loss_fn (Loss) – Loss function for optimization.

Examples

>>> attack = FastGradientSignMethod(network)

class mindarmour.attacks.GeneticAttack(model, pop_size=6, mutation_rate=0.005, per_bounds=0.15, max_steps=1000, step_size=0.2, temp=0.3, bounds=(0, 1.0), adaptive=False, sparse=True)[source]

The Genetic Attack represents the black-box attack based on the genetic algorithm, which belongs to differential evolution algorithms.

This attack was proposed by Moustafa Alzantot et al. (2018).

References: Moustafa Alzantot, Yash Sharma, Supriyo Chakraborty, “GeneticAttack: Practical Black-box Attacks with Gradient-FreeOptimization”

Parameters

model (BlackModel) – Target model.
pop_size (int) – The number of particles, which should be greater than zero. Default: 6.
mutation_rate (float) – The probability of mutations. Default: 0.005.
per_bounds (float) – Maximum L_inf distance.
max_steps (int) – The maximum round of iteration for each adversarial example. Default: 1000.
step_size (float) – Attack step size. Default: 0.2.
temp (float) – Sampling temperature for selection. Default: 0.3.
bounds (tuple) – Upper and lower bounds of data. In form of (clip_min, clip_max). Default: (0, 1.0)
adaptive (bool) – If True, turns on dynamic scaling of mutation parameters. If false, turns on static mutation parameters. Default: False.
sparse (bool) – If True, input labels are sparse-encoded. If False, input labels are one-hot-encoded. Default: True.

Examples

>>> attack = GeneticAttack(model)

generate(inputs, labels)[source]

Generate adversarial examples based on input data and targeted labels (or ground_truth labels).

Parameters

inputs (numpy.ndarray) – Input samples.
labels (numpy.ndarray) – Targeted labels.

Returns

numpy.ndarray, bool values for each attack result.
numpy.ndarray, generated adversarial examples.
numpy.ndarray, query times for each sample.

Examples

>>> advs = attack.generate([[0.2, 0.3, 0.4],
>>>                         [0.3, 0.3, 0.2]],
>>>                        [1, 2])

class mindarmour.attacks.HopSkipJumpAttack(model, init_num_evals=100, max_num_evals=1000, stepsize_search='geometric_progression', num_iterations=20, gamma=1.0, constraint='l2', batch_size=32, clip_min=0.0, clip_max=1.0, sparse=True)[source]

HopSkipJumpAttack proposed by Chen, Jordan and Wainwright is a decision-based attack. The attack requires access to output labels of target model.

References: Chen J, Michael I. Jordan, Martin J. Wainwright. HopSkipJumpAttack: A Query-Efficient Decision-Based Attack. 2019. arXiv:1904.02144

Parameters

model (BlackModel) – Target model.
init_num_evals (int) – The initial number of evaluations for gradient estimation. Default: 100.
max_num_evals (int) – The maximum number of evaluations for gradient estimation. Default: 1000.
stepsize_search (str) – Indicating how to search for stepsize; Possible values are ‘geometric_progression’, ‘grid_search’, ‘geometric_progression’.
num_iterations (int) – The number of iterations. Default: 64.
gamma (float) – Used to set binary search threshold theta. Default: 1.0. For l2 attack the binary search threshold theta is: math:gamma / d^{3/2}. For linf attack is math:gamma / d^2.
constraint (str) – The norm distance to optimize. Possible values are ‘l2’, ‘linf’. Default: l2.
batch_size (int) – Batch size. Default: 32.
clip_min (float, optional) – The minimum image component value. Default: 0.
clip_max (float, optional) – The maximum image component value. Default: 1.
sparse (bool) – If True, input labels are sparse-encoded. If False, input labels are one-hot-encoded. Default: True.

Raises

ValueError – If stepsize_search not in [‘geometric_progression’, ‘grid_search’]
ValueError – If constraint not in [‘l2’, ‘linf’]

Examples

>>> x_test = np.asarray(np.random.random((sample_num,
>>> sample_length)), np.float32)
>>> y_test = np.random.randint(0, class_num, size=sample_num)
>>> instance = HopSkipJumpAttack(user_model)
>>> adv_x = instance.generate(x_test, y_test)

generate(inputs, labels)[source]

Generate adversarial images in a for loop.

Parameters

inputs (numpy.ndarray) – Origin images.
labels (numpy.ndarray) – Target labels.

Returns

numpy.ndarray, bool values for each attack result.
numpy.ndarray, generated adversarial examples.
numpy.ndarray, query times for each sample.

Examples

>>> generate([[0.1,0.2,0.2],[0.2,0.3,0.4]],[2,6])

set_target_images(target_images)[source]

Setting target images for target attack.

Parameters: target_images (numpy.ndarray) – Target images.

class mindarmour.attacks.IterativeGradientMethod(network, eps=0.3, eps_iter=0.1, bounds=(0.0, 1.0), nb_iter=5, loss_fn=None)[source]

Abstract base class for all iterative gradient based attacks.

Parameters

network (Cell) – Target model.
eps (float) – Proportion of adversarial perturbation generated by the attack to data range. Default: 0.3.
eps_iter (float) – Proportion of single-step adversarial perturbation generated by the attack to data range. Default: 0.1.
bounds (tuple) – Upper and lower bounds of data, indicating the data range. In form of (clip_min, clip_max). Default: (0.0, 1.0).
nb_iter (int) – Number of iteration. Default: 5.
loss_fn (Loss) – Loss function for optimization.

abstract generate(inputs, labels)[source]

Generate adversarial examples based on input samples and original/target labels.

Parameters

inputs (numpy.ndarray) – Benign input samples used as references to create adversarial examples.
labels (numpy.ndarray) – Original/target labels.

Raises

NotImplementedError – This function is not available in IterativeGradientMethod.

Examples

>>> adv_x = attack.generate([[0.1, 0.9, 0.6],
>>>                          [0.3, 0, 0.3]],
>>>                         [[0, , 1, 0, 0, 0, 0, 0, 0, 0],
>>>                          [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]])

class mindarmour.attacks.JSMAAttack(network, num_classes, box_min=0.0, box_max=1.0, theta=1.0, max_iteration=1000, max_count=3, increase=True, sparse=True)[source]

JSMA is an targeted & iterative attack based on saliency map of input features.

Reference: The limitations of deep learning in adversarial settings

Parameters

network (Cell) – Target model.
num_classes (int) – Number of labels of model output, which should be greater than zero.
box_min (float) – Lower bound of input of the target model. Default: 0.
box_max (float) – Upper bound of input of the target model. Default: 1.0.
theta (float) – Change ratio of one pixel (relative to input data range). Default: 1.0.
max_iteration (int) – Maximum round of iteration. Default: 100.
max_count (int) – Maximum times to change each pixel. Default: 3.
increase (bool) – If True, increase perturbation. If False, decrease perturbation. Default: True.
sparse (bool) – If True, input labels are sparse-coded. If False, input labels are onehot-coded. Default: True.

Examples

>>> attack = JSMAAttack(network)

generate(inputs, labels)[source]

Generate adversarial examples in batch.

Parameters

inputs (numpy.ndarray) – Input samples.
labels (numpy.ndarray) – Target labels.

Returns

numpy.ndarray, adversarial samples.

Examples

>>> advs = generate([[0.2, 0.3, 0.4], [0.3, 0.4, 0.5]], [1, 2])

class mindarmour.attacks.LBFGS(network, eps=1e-05, bounds=(0.0, 1.0), is_targeted=True, nb_iter=150, search_iters=30, loss_fn=None, sparse=False)[source]

Uses L-BFGS-B to minimize the distance between the input and the adversarial example.

References: Pedro Tabacof, Eduardo Valle. “Exploring the Space of Adversarial Images”

Parameters

network (Cell) – The network of attacked model.
eps (float) – Attack step size. Default: 1e-5.
bounds (tuple) – Upper and lower bounds of data. Default: (0.0, 1.0)
is_targeted (bool) – If True, targeted attack. If False, untargeted attack. Default: True.
nb_iter (int) – Number of iteration of lbfgs-optimizer, which should be greater than zero. Default: 150.
search_iters (int) – Number of changes in step size, which should be greater than zero. Default: 30.
loss_fn (Functions) – Loss function of substitute model. Default: None.
sparse (bool) – If True, input labels are sparse-coded. If False, input labels are onehot-coded. Default: False.

Examples

>>> attack = LBFGS(network)

generate(inputs, labels)[source]

Generate adversarial examples based on input data and target labels.

Parameters

inputs (numpy.ndarray) – Benign input samples used as references to create adversarial examples.
labels (numpy.ndarray) – Original/target labels.

Returns

numpy.ndarray, generated adversarial examples.

Examples

>>> adv = attack.generate([[0.1, 0.2, 0.6], [0.3, 0, 0.4]], [2, 2])

class mindarmour.attacks.LeastLikelyClassMethod(network, eps=0.07, alpha=None, bounds=(0.0, 1.0), loss_fn=None)[source]

Least-Likely Class Method.

References: F. Tramer, et al., “Ensemble adversarial training: Attacks and defenses,” in ICLR, 2018

Parameters

network (Cell) – Target model.
eps (float) – Proportion of single-step adversarial perturbation generated by the attack to data range. Default: 0.07.
alpha (float) – Proportion of single-step random perturbation to data range. Default: None.
bounds (tuple) – Upper and lower bounds of data, indicating the data range. In form of (clip_min, clip_max). Default: (0.0, 1.0).
loss_fn (Loss) – Loss function for optimization.

Examples

>>> attack = LeastLikelyClassMethod(network)

class mindarmour.attacks.MomentumIterativeMethod(network, eps=0.3, eps_iter=0.1, bounds=(0.0, 1.0), is_targeted=False, nb_iter=5, decay_factor=1.0, norm_level='inf', loss_fn=None)[source]

The Momentum Iterative Method attack.

References: Y. Dong, et al., “Boosting adversarial attacks with momentum,” arXiv:1710.06081, 2017

Parameters

network (Cell) – Target model.
eps (float) – Proportion of adversarial perturbation generated by the attack to data range. Default: 0.3.
eps_iter (float) – Proportion of single-step adversarial perturbation generated by the attack to data range. Default: 0.1.
bounds (tuple) – Upper and lower bounds of data, indicating the data range. In form of (clip_min, clip_max). Default: (0.0, 1.0).
is_targeted (bool) – If True, targeted attack. If False, untargeted attack. Default: False.
nb_iter (int) – Number of iteration. Default: 5.
decay_factor (float) – Decay factor in iterations. Default: 1.0.
norm_level (Union[int, numpy.inf]) – Order of the norm. Possible values: np.inf, 1 or 2. Default: ‘inf’.
loss_fn (Loss) – Loss function for optimization.

generate(inputs, labels)[source]

Generate adversarial examples based on input data and origin/target labels.

Parameters

inputs (numpy.ndarray) – Benign input samples used as references to create adversarial examples.
labels (numpy.ndarray) – Original/target labels.

Returns

numpy.ndarray, generated adversarial examples.

Examples

>>> adv_x = attack.generate([[0.5, 0.2, 0.6],
>>>                          [0.3, 0, 0.2]],
>>>                         [[0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
>>>                          [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]])

class mindarmour.attacks.NES(model, scene, max_queries=10000, top_k=- 1, num_class=10, batch_size=128, epsilon=0.3, samples_per_draw=128, momentum=0.9, learning_rate=0.001, max_lr=0.05, min_lr=0.0005, sigma=0.001, plateau_length=20, plateau_drop=2.0, adv_thresh=0.25, zero_iters=10, starting_eps=1.0, starting_delta_eps=0.5, label_only_sigma=0.001, conservative=2, sparse=True)[source]

The class is an implementation of the Natural Evolutionary Strategies Attack, including three settings: Query-Limited setting, Partial-Information setting and Label-Only setting.

References: Andrew Ilyas, Logan Engstrom, Anish Athalye, and Jessy Lin. Black-box adversarial attacks with limited queries and information. In ICML, July 2018

Parameters

model (BlackModel) – Target model.
scene (str) – Scene in ‘Label_Only’, ‘Partial_Info’ or ‘Query_Limit’.
max_queries (int) – Maximum query numbers to generate an adversarial example. Default: 500000.
top_k (int) – For Partial-Info or Label-Only setting, indicating how much (Top-k) information is available for the attacker. For Query-Limited setting, this input should be set as -1. Default: -1.
num_class (int) – Number of classes in dataset. Default: 10.
batch_size (int) – Batch size. Default: 96.
epsilon (float) – Maximum perturbation allowed in attack. Default: 0.3.
samples_per_draw (int) – Number of samples draw in antithetic sampling. Default: 96.
momentum (float) – Momentum. Default: 0.9.
learning_rate (float) – Learning rate. Default: 1e-2.
max_lr (float) – Max Learning rate. Default: 1e-2.
min_lr (float) – Min Learning rate. Default: 5e-5.
sigma (float) – Step size of random noise. Default: 1e-3.
plateau_length (int) – Length of plateau used in Annealing algorithm. Default: 20.
plateau_drop (float) – Drop of plateau used in Annealing algorithm. Default: 2.0.
adv_thresh (float) – Threshold of adversarial. Default: 0.15.
zero_iters (int) – Number of points to use for the proxy score. Default: 10.
starting_eps (float) – Starting epsilon used in Label-Only setting. Default: 1.0.
starting_delta_eps (float) – Delta epsilon used in Label-Only setting. Default: 0.5.
label_only_sigma (float) – Sigma used in Label-Only setting. Default: 1e-3.
conservative (int) – Conservation used in epsilon decay, it will increase if no convergence. Default: 2.
sparse (bool) – If True, input labels are sparse-encoded. If False, input labels are one-hot-encoded. Default: True.

Examples

>>> SCENE = 'Label_Only'
>>> TOP_K = 5
>>> num_class = 5
>>> nes_instance = NES(user_model, SCENE, top_k=TOP_K)
>>> initial_img = np.asarray(np.random.random((32, 32)), np.float32)
>>> target_image  = np.asarray(np.random.random((32, 32)), np.float32)
>>> orig_class = 0
>>> target_class = 2
>>> nes_instance.set_target_images(target_image)
>>> tag, adv, queries = nes_instance.generate([initial_img], [target_class])

generate(inputs, labels)[source]

Main algorithm for NES.

Parameters

inputs (numpy.ndarray) – Benign input samples.
labels (numpy.ndarray) – Target labels.

Returns

numpy.ndarray, bool values for each attack result.
numpy.ndarray, generated adversarial examples.
numpy.ndarray, query times for each sample.

Raises

ValueError – If the top_k less than 0 in Label-Only or Partial-Info setting.
ValueError – If the target_imgs is None in Label-Only or Partial-Info setting.
ValueError – If scene is not in [‘Label_Only’, ‘Partial_Info’, ‘Query_Limit’]

Examples

>>> advs = attack.generate([[0.2, 0.3, 0.4], [0.3, 0.3, 0.2]],
>>> [1, 2])

set_target_images(target_images)[source]

Set target samples for target attack.

Parameters: target_images (numpy.ndarray) – Target samples for target attack.

class mindarmour.attacks.PSOAttack(model, step_size=0.5, per_bounds=0.6, c1=2.0, c2=2.0, c=2.0, pop_size=6, t_max=1000, pm=0.5, bounds=None, targeted=False, reduction_iters=3, sparse=True)[source]

The PSO Attack represents the black-box attack based on Particle Swarm Optimization algorithm, which belongs to differential evolution algorithms. This attack was proposed by Rayan Mosli et al. (2019).

References: Rayan Mosli, Matthew Wright, Bo Yuan, Yin Pan, “They Might NOT Be Giants: Crafting Black-Box Adversarial Examples with Fewer Queries Using Particle Swarm Optimization”, arxiv: 1909.07490, 2019.

Parameters

model (BlackModel) – Target model.
step_size (float) – Attack step size. Default: 0.5.
per_bounds (float) – Relative variation range of perturbations. Default: 0.6.
c1 (float) – Weight coefficient. Default: 2.
c2 (float) – Weight coefficient. Default: 2.
c (float) – Weight of perturbation loss. Default: 2.
pop_size (int) – The number of particles, which should be greater than zero. Default: 6.
t_max (int) – The maximum round of iteration for each adversarial example, which should be greater than zero. Default: 1000.
pm (float) – The probability of mutations. Default: 0.5.
bounds (tuple) – Upper and lower bounds of data. In form of (clip_min, clip_max). Default: None.
targeted (bool) – If True, turns on the targeted attack. If False, turns on untargeted attack. Default: False.
reduction_iters (int) – Cycle times in reduction process. Default: 3.
sparse (bool) – If True, input labels are sparse-encoded. If False, input labels are one-hot-encoded. Default: True.

Examples

>>> attack = PSOAttack(model)

generate(inputs, labels)[source]

Generate adversarial examples based on input data and targeted labels (or ground_truth labels).

Parameters

inputs (numpy.ndarray) – Input samples.
labels (numpy.ndarray) – Targeted labels or ground_truth labels.

Returns

numpy.ndarray, bool values for each attack result.
numpy.ndarray, generated adversarial examples.
numpy.ndarray, query times for each sample.

Examples

>>> advs = attack.generate([[0.2, 0.3, 0.4], [0.3, 0.3, 0.2]],
>>> [1, 2])

class mindarmour.attacks.PointWiseAttack(model, max_iter=1000, search_iter=10, is_targeted=False, init_attack=None, sparse=True)[source]

The Pointwise Attack make sure use the minimum number of changed pixels to generate adversarial sample for each original sample.Those changed pixels will use binary seach to make sure the distance between adversarial sample and original sample is as close as possible.

References: L. Schott, J. Rauber, M. Bethge, W. Brendel: “Towards the first adversarially robust neural network model on MNIST”, ICLR (2019)

Parameters

model (BlackModel) – Target model.
max_iter (int) – Max rounds of iteration to generate adversarial image.
search_iter (int) – Max rounds of binary search.
is_targeted (bool) – If True, targeted attack. If False, untargeted attack. Default: False.
init_attack (Attack) – Attack used to find a starting point. Default: None.
sparse (bool) – If True, input labels are sparse-encoded. If False, input labels are one-hot-encoded. Default: True.

Examples

>>> attack = PointWiseAttack(model)

generate(inputs, labels)[source]

Generate adversarial examples based on input samples and targeted labels.

Parameters

inputs (numpy.ndarray) – Benign input samples used as references to create adversarial examples.
labels (numpy.ndarray) – For targeted attack, labels are adversarial target labels. For untargeted attack, labels are ground-truth labels.

Returns

numpy.ndarray, bool values for each attack result.
numpy.ndarray, generated adversarial examples.
numpy.ndarray, query times for each sample.

Examples

>>> is_adv_list, adv_list, query_times_each_adv = attack.generate(
>>>     [[0.1, 0.2, 0.6], [0.3, 0, 0.4]],
>>>     [2, 3])

class mindarmour.attacks.ProjectedGradientDescent(network, eps=0.3, eps_iter=0.1, bounds=(0.0, 1.0), is_targeted=False, nb_iter=5, norm_level='inf', loss_fn=None)[source]

The Projected Gradient Descent attack is a variant of the Basic Iterative Method in which, after each iteration, the perturbation is projected on an lp-ball of specified radius (in addition to clipping the values of the adversarial sample so that it lies in the permitted data range). This is the attack proposed by Madry et al. for adversarial training.

References: A. Madry, et al., “Towards deep learning models resistant to adversarial attacks,” in ICLR, 2018

Parameters

network (Cell) – Target model.
eps (float) – Proportion of adversarial perturbation generated by the attack to data range. Default: 0.3.
eps_iter (float) – Proportion of single-step adversarial perturbation generated by the attack to data range. Default: 0.1.
bounds (tuple) – Upper and lower bounds of data, indicating the data range. In form of (clip_min, clip_max). Default: (0.0, 1.0).
is_targeted (bool) – If True, targeted attack. If False, untargeted attack. Default: False.
nb_iter (int) – Number of iteration. Default: 5.
norm_level (Union[int, numpy.inf]) – Order of the norm. Possible values: np.inf, 1 or 2. Default: ‘inf’.
loss_fn (Loss) – Loss function for optimization.

generate(inputs, labels)[source]

Iteratively generate adversarial examples based on BIM method. The perturbation is normalized by projected method with parameter norm_level .

Parameters

inputs (numpy.ndarray) – Benign input samples used as references to create adversarial examples.
labels (numpy.ndarray) – Original/target labels.

Returns

numpy.ndarray, generated adversarial examples.

Examples

>>> adv_x = attack.generate([[0.6, 0.2, 0.6],
>>>                          [0.3, 0.3, 0.4]],
>>>                         [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
>>>                          [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

class mindarmour.attacks.RandomFastGradientMethod(network, eps=0.07, alpha=0.035, bounds=(0.0, 1.0), norm_level=2, is_targeted=False, loss_fn=None)[source]

Fast Gradient Method use Random perturbation.

References: Florian Tramer, Alexey Kurakin, Nicolas Papernot, “Ensemble adversarial training: Attacks and defenses” in ICLR, 2018

Parameters

network (Cell) – Target model.
eps (float) – Proportion of single-step adversarial perturbation generated by the attack to data range. Default: 0.07.
alpha (float) – Proportion of single-step random perturbation to data range. Default: 0.035.
bounds (tuple) – Upper and lower bounds of data, indicating the data range. In form of (clip_min, clip_max). Default: (0.0, 1.0).
norm_level (Union[int, numpy.inf]) – Order of the norm.
values (Possible) – np.inf, 1 or 2. Default: 2.
is_targeted (bool) – If True, targeted attack. If False, untargeted attack. Default: False.
loss_fn (Loss) – Loss function for optimization.

Raises

ValueError – eps is smaller than alpha!

Examples

>>> attack = RandomFastGradientMethod(network)

class mindarmour.attacks.RandomFastGradientSignMethod(network, eps=0.07, alpha=0.035, bounds=(0.0, 1.0), is_targeted=False, loss_fn=None)[source]

Fast Gradient Sign Method using random perturbation.

References: F. Tramer, et al., “Ensemble adversarial training: Attacks and defenses,” in ICLR, 2018

Parameters

network (Cell) – Target model.
eps (float) – Proportion of single-step adversarial perturbation generated by the attack to data range. Default: 0.07.
alpha (float) – Proportion of single-step random perturbation to data range. Default: 0.035.
bounds (tuple) – Upper and lower bounds of data, indicating the data range. In form of (clip_min, clip_max). Default: (0.0, 1.0).
is_targeted (bool) – True: targeted attack. False: untargeted attack. Default: False.
loss_fn (Loss) – Loss function for optimization.

Raises

ValueError – eps is smaller than alpha!

Examples

>>> attack = RandomFastGradientSignMethod(network)

class mindarmour.attacks.RandomLeastLikelyClassMethod(network, eps=0.07, alpha=0.035, bounds=(0.0, 1.0), loss_fn=None)[source]

Least-Likely Class Method use Random perturbation.

References: F. Tramer, et al., “Ensemble adversarial training: Attacks and defenses,” in ICLR, 2018

Parameters

network (Cell) – Target model.
eps (float) – Proportion of single-step adversarial perturbation generated by the attack to data range. Default: 0.07.
alpha (float) – Proportion of single-step random perturbation to data range. Default: 0.035.
bounds (tuple) – Upper and lower bounds of data, indicating the data range. In form of (clip_min, clip_max). Default: (0.0, 1.0).
loss_fn (Loss) – Loss function for optimization.

Raises

ValueError – eps is smaller than alpha!

Examples

>>> attack = RandomLeastLikelyClassMethod(network)

class mindarmour.attacks.SaltAndPepperNoiseAttack(model, bounds=(0.0, 1.0), max_iter=100, is_targeted=False, sparse=True)[source]

Increases the amount of salt and pepper noise to generate adversarial samples.

Parameters

model (BlackModel) – Target model.
bounds (tuple) – Upper and lower bounds of data. In form of (clip_min, clip_max). Default: (0.0, 1.0)
max_iter (int) – Max iteration to generate an adversarial example. Default: 100
is_targeted (bool) – If True, targeted attack. If False, untargeted attack. Default: False.
sparse (bool) – If True, input labels are sparse-encoded. If False, input labels are one-hot-encoded. Default: True.

Examples

>>> attack = SaltAndPepperNoiseAttack(model)

generate(inputs, labels)[source]

Generate adversarial examples based on input data and target labels.

Parameters

inputs (numpy.ndarray) – The original, unperturbed inputs.
labels (numpy.ndarray) – The target labels.

Returns

numpy.ndarray, bool values for each attack result.
numpy.ndarray, generated adversarial examples.
numpy.ndarray, query times for each sample.

Examples

>>> adv_list = attack.generate(([[0.1, 0.2, 0.6],
>>>                              [0.3, 0, 0.4]],
>>>                             [[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
>>>                              [0, , 0, 1, 0, 0, 0, 0, 0, 0, 0]])