WaveTransform: Crafting Adversarial Examples via Input Decomposition

论文 Deep Talk 4周前 (10-31) 12次浏览 已收录 0个评论 扫描二维码

WaveTransform: Crafting Adversarial Examples via Input Decomposition

Abstract

Frequency spectrum has played a significant role in learning unique and discriminating features for object recognition. Both low and high frequency information present in images have been extracted and learnt by a host of representation learning techniques, including deep learning. Inspired by this observation, we introduce a novel class of adversarial attacks, namely ‘WaveTransform’, that creates adversarial noise corresponding to low-frequency and high-frequency subbands, separately (or in combination). The frequency subbands are analyzed using wavelet decomposition; the subbands are corrupted and then used to construct an adversarial example. Experiments are performed using multiple databases and CNN models to establish the effectiveness of the proposed WaveTransform attack and analyze the importance of a particular frequency component. The robustness of the proposed attack is also evaluated through its transferability and resiliency against a recent adversarial defense algorithm. Experiments show that the proposed attack is effective against the defense algorithm and is also transferable across CNNs.

Keywords:

Transformed Domain Attacks, Resiliency, Transferability, Wavelet, CNN, and Object Recognition

1 Introduction

WaveTransform: Crafting Adversarial Examples via Input Decomposition
Figure 1: Fooling of CNN model using the proposed attack on a broad range of databases including object recognition (Tiny ImageNet [45], ImageNet [11], CIFAR-10 [28]), face identification (Multi-PIE [23]), and Fashion data classification (Fashion-MNIST [42]). In each image set, the first image is the clean image, the second is an adversarial image, and the last is the adversarial noise. It can be clearly observed that the proposed attack is able to fool the networks with high confidence score.

Convolutional neural networks (CNNs) for image classification are known to utilize both high and low frequency information [32], [39]. Goodfellow et al. [20] show that the CNN activations are sensitive towards high-frequency information present in an image. It is also shown that some neurons are sensitive towards the upper right stroke, while some are activated for the lower edge. Furthermore, Geirhos et al. [15] have shown that the CNN trained on ImageNet [11] are highly biased towards texture (high-frequency) and shape of the object (low-frequency). We hypothesize that if an attacker can manipulate the frequency information presented in an image, it can fool CNN architectures as well. With this motivation, we propose a novel method of adversarial example generation that utilizes the low-frequency and high-frequency information individually or in combination. To find the texture and shape information, a wavelet-based decomposition is an ideal choice which yields multi-resolution high-frequency and low-frequency images. Therefore, the proposed method incorporates wavelet decomposition to obtain multiple high and low-frequency images and adversarial noise is added to individual or combined wavelet components through gradient descent learning to generate an adversarial example. Since almost every CNN learns these kinds of features; therefore, the attack generated by perturbing the high frequency (edge) information makes it easily transferable to different networks. In brief, the key highlights of this research are:

  • a novel class of adversarial example generation is proposed by decomposing the image into low-frequency and high-frequency information via wavelet transform;

  • extensive experiments concerning multiple databases including ImageNet [11], CIFAR-10 [28], and Tiny ImageNet [45],

  • multiple CNN models including ResNet [24] and DenseNet [25] are used to showcase the effectiveness of the proposed WaveTransform;

  • the robustness of the proposed attack is evaluated against a recent complex adversarial defense.

Fig. 1 shows the effectiveness of the proposed attack on multiple databases covering color and gray-scale object images to face images. The proposed attack can fool the network trained on each data type with high confidence. For example, on the color object image (the first image of the top row), the model predicts the correct class (i.e., ) with confidence , while, after the attack, the network misclassifies it to the wrong category (i.e., ) with confidence .

2 Related Work

Adversarial generation algorithms presented in the literature can be divided into the following categories: (i) gradient-based, (ii) optimization-based, (iii) decision boundary-based, and (iv) universal perturbation.

Goodfellow et al. [20] proposed a fast attack method that calculated the gradient of the image concerning the final output and pushed the image pixels in the direction opposite to the sign of the gradient. The adversarial noise vector can be defined as: , where controls the magnitude of perturbation, represents the gradient of image with respect to network parameters . The perturbation vector is added in the image to generate the adversarial image. The above process is applied for a single step, which is less effective and can easily be defended [30]. Therefore, several researchers have proposed the variant where the noise is added iteratively [29], [31], and with momentum [13]. Moosavi-Dezfooli et al. [34] have proposed a method that can transfer clean images from their decision boundaries to some other, belonging to a different class. The attacks are performed iteratively using a linear approximation of the non-linear decision boundary. Carlini and Wagner [8] presented attacks by restricting the norm of an adversarial image. The other variant, such as and , are also proposed; however, they are found to be less effective as compared to . Similar to norm minimization, Chen et al. [10] have proposed the elastic norm optimization attack, which is the combination of and norm. Goswami et al. [21, 22] presented several black-box attacks to fool the state-of-the-art face recognition algorithms. Later, both adversarial examples detection and mitigation algorithms are also presented in the paper. Agarwal et al. [2] shown the use of filtering operations in generating adversarial noise in a network-agnostic manner.

Other popular adversarial generation algorithms are based on generative networks [41], and EoT [5]. The application of an adversarial attack is not restricted to 2D object recognition but also explored for semantic segmentation [43], 3D recognition [40], audio classification [9], text recognition [14], and reinforcement learning [6]. Goel et al. [19] have developed an adversarial toolbox for the generation of adversarial perturbations and defense against them. The details of the existing algorithms can be found in the survey papers presented by Yuan et al. [47] and Singh et al. [37].

WaveTransform: Crafting Adversarial Examples via Input Decomposition
Figure 2: Schematic diagram of the proposed ‘WaveTransform’ adversarial attack algorithm. DWT and IDWT are the forwards and inverse discrete wavelet decomposition. The noise is added to the desired wavelet subband and optimized to increase the loss of the network. LL represents the low pass subband. LH, HL, and HH represent the high-pass subbands in horizontal, vertical, and diagonal directions.

3 Proposed WaveTransform Attack Algorithm

Adversarial attacks generally modify the image in the spatial domain. In this research, we propose a new class of attack termed as WaveTransform where the image is first transformed into the frequency (scale) domain using wavelet decomposition. A digital image is composed of low frequency and high-frequency information, where the role of each frequency component might be different in its spatial representation. With this observation, high and low-frequency bands are perturbed such that the reconstructed image is an adversarial example but visually close to the clean example. The proposed attack can be defined using the following equation:

(1)

where, represent the clean and perturbed images, respectively. is the loss term trade-off parameter, is the classification loss function of the target CNN classifier , and is the target label. The aim is to find an adversarial image that maximizes the classification error for a target label while keeping the noise imperceptible to the human observer.

Initialization:
  • Let the selected subbands be expressed by , for a particular image .

  • Let the perturbed image be

  • Let be the ground truth label of the image.

  • Let be the number of random restarts taken, be the number of steps to optimize the objective function

  • Let be the step size of the update and let be the minibatch size.

  • Let the CNN model be expressed as .

  • Let be the maximum amount of noise that may be added to , such that

  • for restarts in r do
           Initialize by adding random noise to from range
           for steps in k do
                 Obtain subbands () by decomposing .


    Update subband(s) to maximize classification error by gradient ascent using the term:


                 Project into valid range by clipping pixels and update ;
                 ;
                 if  then
                       Return
                
          
    Return
    Algorithm 1 Subband Updating (Proposed Adversarial Attack)

    A discrete wavelet transform () is applied on to obtain the , , and subbands, using low pass and high pass filters. The LL band contains low frequency information. Whereas LH, HL, and HH contain the high frequency information in horizontal, vertical, and diagonal directions, respectively. These subbands are then modified by taking a step in the direction of the sign of the gradient of the subbands concerning the final output vector. The image is then reconstructed with the modified subbands using an inverse discrete wavelet transform (), to obtain the desired image . As shown in Fig. 2, the attack is performed iteratively to find an adversarial image with minimal distortion. It is ensured that remains a valid image after updating its wavelet subbands by projecting the image back onto a ball of valid pixel values such that . If the noise that can be added or removed, is already limited to , we add another clipping operation limiting pixel values such that . Since, in this setting, there is no need to minimize the added noise explicitly, we also fix the trade-off parameter to . Based on this, we propose our main method called Subband Updating, where particular subbands obtained by the discrete wavelet transform of the image are updated using projected gradient ascent. The proposed ‘WaveTransform’ adversarial attack algorithm is described in Algorithm 1.

    4 Experimental Setup

    The experiments are performed using multiple databases and CNN models. This section describes the databases used to generate the adversarial examples, CNN models used to report the results and parameters for adversarial attack and defense algorithms.

    Databases:
    The proposed method is evaluated with databases comprising a wide range of target images: Fashion-MNIST (F-MNIST) [42], CIFAR-10 [28], frontal-image set of Multi-PIE [23], Tiny-ImageNet [45], and ImageNet [11]. Fashion-MNIST comprises low-resolution grayscale images of 10 different apparel categories. CIFAR-10 contains low-resolution RGB images of 10 different object categories. Multi-PIE database has high-resolution RGB images of 337 individuals and Tiny-ImageNet [45] contains 10,000 images from over 200 classes from the ILSVRC challenge [36]. To perform the experiments on ImageNet, the validation set comprising 50,000 images are used. These datasets also vary in color space, CIFAR-10 and Tiny-Imagenet contain color images while F-MNIST contains gray-scale images.

    Layer Type Output Size Description
    Batch Norm 2D 28×28 channels 1, affine False
    Conv 2D 28×28 5×5, 10, stride 1
    Max Pool 2D 24×24 kernel 2×2
    ReLU 23×23
    Conv 2D 23×23 5×5, 20, stride 1
    Max Pool2D 21×21 kernel 2×2
    ReLU 20×20
    Dropout 2D 20×20 Dropout prob 0.2
    Flatten 400×1 Convert to a 1D vector
    Linear 400×1 320, 10
    Output 10×1 Final logits
    Table 1: Architecture of the custom model used for Fashion-MNIST experiments. [42].

    CNN Models and Implementation Details: Recent CNN architectures with high classification performance are used for the experiments. For Multi-PIE, we use a ResNet-50 model [24] pretrained on VGG-Face 2 [7] and an InceptionNet-V1 [38] pretrained on CASIA-Webface [46]. For CIFAR-10, pretrained ResNet-50 [24] and DenseNet-121 [25] are used, pretrained on the same. For Fashion-MNIST, a 10-layer custom CNN, as described in Table 1, has been used, and a pretrained ResNet-50 is used with Tiny-ImageNet and ImageNet. The standard models are fine-tuned, replacing the last layer of the network to match the number of classes in the target database and then iterating over the training split of the data for 30 epochs using the Adam [27] optimizer with a learning rate of 0.0001 and batch size of 128. Standard train-validation-test splits are used for CIFAR-10, Fashion-MNIST, and Tiny-ImageNet databases. From the Multi-PIE database [23], 4753 training images, 1690 validation, and 3557 test images are randomly selected. All the models use images in the range , and the experimental results are summarized on the test split of the data, except for ImageNet and Tiny-ImageNet, where experimental results are reported on the validation split.

    Attack Parameters:
    In the experiments, each attack follows the same setting unless mentioned. Cross-entropy is used as the classification loss function . The SGD [26] optimizer is used to calculate the gradient of the subbands concerning the final logits vector used for classification. The experiments are performed using multiple different wavelet filters including Haar, Daubechies (db2 and db3), and Bi-orthogonal. Before computing the discrete wavelet transform, input data is extrapolated by zero-padding. Each attack runs for 20 iterations with 20 restarts, where the adversarial image is initialized with added random noise. This is referred to as random restarts by Madry et al. [31], where the attack algorithm finally returns the first valid adversarial image produced between all restarts. The maximum amount of noise that may be added to (or removed from) a clean image is fixed at in terms of norm for all the attacks. The step size of the subband update is fixed at .

    5 Results and Observations

    This section describes the results corresponding to original and adversarial images generated via perturbing the individual or combined wavelet components. Extensive analysis has been performed to understand the effect of different filters with wavelet transformation. To demonstrate the effectiveness of the transformed domain attack, we have compared the performance with prevalent pixel-level attacks and recent steganography based attacks. We have also evaluated the transferability of the proposed attack and resiliency against a recent defense algorithm [39].

    WaveTransform: Crafting Adversarial Examples via Input Decomposition
    Figure 3: Illustrating the individual wavelet components of the adversarial images generated using clean images from Multi-PIE [23] and Tiny-ImageNet [45] databases. While the adversarial images are visually close to the clean images; the individual high-frequency components (, , and ) clearly show that the noise is injected to fool the system.
    The wavelet components corresponding to the subband show the maximum effect of the adversarial noise.
    WaveTransform: Crafting Adversarial Examples via Input Decomposition
    Figure 4: Adversarial noise generated by attacking different subbands and adding to the clean image to obtain the corresponding adversarial images. Images are taken from Tiny-ImageNet [45] (left) and Multi-PIE [23] (right). It is observed that adversarial images generated using low-frequency or high-frequency or both the components, are effective in fooling the CNN with high confidence.

    5.1 Effectiveness of WaveTransform

    To evaluate the effectiveness of attacking different subbands, we performed experiments with individual subbands and different combinations of subbands. Fig. 3 shows samples of clean and adversarial images corresponding to individual subband from the Multi-PIE [23] and Tiny-ImageNet [45] databases. Individual wavelet components of both image classes help understand the effect of adversarial noise on each frequency information. While the noise in the low-frequency image is quasi subtle, it is visible in the high-frequency components. Among the high-frequency components, the HH component yields the highest amount of distortion. It is interesting to note that the final adversarial images are close to their clean counterpart. Fig. 4 shows adversarial images generated by perturbing the different frequency components of an image. Adversarial image, whether created from low-frequency perturbation or high-frequency, can fool the classifier with high confidence.

    Table 2 summarizes the results on each database for the clean as well as the adversarial images. The ResNet-50 model trained on the CIFAR-10 database yields % object classification accuracy on clean test images. The performance of the model decreases drastically when any of the wavelet frequency band is perturbed. For example, when only the low frequency band is corrupted, the model fails and can classify % test images only. The performance drops further when all the high subbands (, , and ) are perturbed and yields only % classification accuracy. The results show that each element is essential, and perturbing any component can significantly reduce the network performance.
    Similarly, on the Tiny-ImageNet [45], the proposed attack can fool the ResNet-50 model almost perfectly. The model, which yields % object recognition accuracy on clean test images, gives % accuracy on adversarial images. On the Multi-PIE database, the ResNet-50 model yields % face identification accuracy, which reduces to % when both low and high-frequency components are perturbed.

    Dataset CIFAR-10 F-MNIST Multi-PIE Tiny-ImageNet
     Original Accuracy
    94.38 87.88 99.41 75.29
     LL Subband Attack
    3.11 59.04 0.08 0.01
     LH Subband Attack
    7.10 78.51 0.06 0.01
     HL Subband Attack
    6.56 72.73 0.10 0.01
     HH Subband Attack
    13.77 80.56 0.10 0.54
     High Subbands Attack
    1.03 70.04 0.08 0.01
     All Subbands Attack
    0.16 58.36 0.06 0.01
    Table 2: Classification rates (%) of the original images and adversarial images generated by attacking different wavelet subbands. The ResNet-50 model is used for CIFAR-10 [28], Multi-PIE [23] and Tiny-ImageNet [45]. The results on F-MNIST [42] are reported using custom CNN (refer Table 1). Bold values represent the best fooling rate achieved by perturbing all subbands, and ‘underline’ value represents if the fooling rate is the same with all subbands perturbation.

    On the Fashion-MNIST [42] database, the proposed attack reduces the model accuracy from % to %. In comparison to other databases, the drop on the F-MNIST database is low, which can be attributed to the lack of high textural and object shape information. It is also interesting to note that the model used on F-MNIST is much shallower as compared to the models used for other databases. While the deeper models give higher recognition accuracy as compared to the shallow model; they also find more sensitivity against adversarial perturbations in comparison to the shallow model [30]. The results reported in Table 2 corresponds to a ‘white-box’ scenario where an attacker has complete access to the classification network.

    Importance of Filter: A filter is a critical part of DWT; therefore, to understand which types of filters are useful in crafting the proposed attack, we have performed experiments with multiple types of filters: Haar, Daubechies (db2 and db3), and Bi-orthogonal. Across the experiments on each database and CNN models, it is observed that ‘Haar’ is more effective in comparison to other filters in reducing the classification performance. For example, on the F-MNIST [42] database, the Haar filter reduces the model accuracy to from %, which is at least % better than Daubechies and Bi-orthogonal.

    WaveTransform: Crafting Adversarial Examples via Input Decomposition
    Figure 5: Comparison of the adversarial images generated using the proposed and existing attacks including FGSM, PGD and Ud Din et al. [12] on the ImageNet. Images used have been resized and center-cropped to make them of size .

    5.2 Comparison with Existing Attack Algorithms

    We next compare the performance of WaveTransform with pixel-level attacks and recent wavelet based attacks in literature. Fig. 5 shows the adversarial images generated using the proposed, existing pixel level attacks FGSM and PGD, and steganography attack by Ud Din et al. [12].

    Pixel-level Attacks: While most of the existing adversarial attack algorithms work at the pixel level, i.e., in the image space only; the proposed attack works at the transformation level. Therefore, we have also compared the performance of the proposed attack with popular methods such as Projected Gradient Descent (PGD) [31] and Fast Gradient Sign Method (FGSM) [20] with in terms of accuracy and image degradation. Image degradation metrics such as Universal Image Quality Index (UIQI) [50] is a useful measure for attack quality. An adversarial example with a higher UIQI (with the maximum being 100, for the original image), is perceptually harder to distinguish from the clean image. On the CIFAR-10 database, while the proposed attack with perturbation on both low and high-frequency subbands reduces the performance of ResNet-50 to % from %, existing PGD and FGSM reduce the performance to % and %, respectively. Similarly, on the ImageNet validation database, the proposed attack reduces the performance of ResNet-50 to % from %. On the contrary, the existing PGD and FGSM attacks reduce the recognition accuracy to % and %, respectively. The experiments show that the proposed attack can either surpass the existing attack or perform comparably on both databases.

    While the perturbation strength both in the existing and proposed attacks is fixed to quasi imperceptible level, we have evaluated the image quality of the adversarial examples. The average UIQI computed from the adversarial examples computed on the CIFAR-10 and ImageNet databases show a value of more than 99. The higher value (close to maximum, i.e., 100) shows that both existing and proposed attacks retain the quality of images and make the noise imperceptible to humans.

    Comparison with Recent Attack: The closest attack to the proposed attack is recently proposed by Yahya et al. [44] and Ud Din et al. [12]. These attacks are based on the concept of steganography, where a watermark image referred to as a secret image is embedded in the clean images using wavelet decomposition. The performance of the model is dependent on the secret image. To make the attack highly successful, i.e., to reduce the CNN’s recognition performance, a compelling steganography image is selected based on its fooling rate on the target CNN. However, the proposed approach has no requirement of an additional watermark image and learns the noise vector from the network itself. Since Yahya et al. [44] have shown the effectiveness of the attack on the simple MNIST database only, we have compared the performance with Ud Din et al. [12]. They have evaluated their method on a validation set of ImageNet.

    To maintain consistency, the experiments are performed on a validation set of ImageNet with ResNet-50. Along with visual comparison, the results are also compared using fooling ratio as the evaluation metric, which is defined as

    (2)

    where is a trained classifier, is a clean image from the database, is the total number of samples, and is the adversarial noise. Using the best steganography image, the attack by Ud Din et al. [12] on a pretrained ResNet-50 achieves a fooling ratio of % whereas, the proposed attack achieves a fooling ratio of %.

    WaveTransform: Crafting Adversarial Examples via Input Decomposition
    Figure 6: Illustrating transfer capability of the proposed attack using CIFAR-10 database [28]. The graph on the right shows the results of adversarial images generated on ResNet-50 being tested on DenseNet-121. The plot on the left shows the results of adversarial images generated on DenseNet-121 being tested on ResNet-50. The performance of ResNet-50 and DenseNet-121 are degraded upto % from % and % from %, respectively.

    5.3 Transferability and Resiliency of WaveTransform

    Finally, we evaluate the transferability and resiliency of the proposed attack on multiple databases.

    Transferability: In the real-world settings, the attacker might not know the target CNN model, which he/she wants to fool. In such a scenario, to make the attack more practical, it is necessary to evaluate its effectiveness with an unseen testing network – the adversarial images generated using one model are used to fool another unseen model. The scenario refers to ‘black-box’ setting in adversarial attack literature [4] where an attacker does not have access to the target model. The experiments are performed on the CIFAR-10 [28] and Multi-PIE [23] databases.

    For CIFAR-10 [28], two state-of-the-art CNN models are used, i.e., ResNet-50 and DenseNet-121, and the results are summarized in Fig. 6. The ResNet model yields % accuracy on clean images of CIFAR-10 [28]; on the other hand, DenseNet gives % classification accuracy. When the adversarial images generated using the ResNet model are used for classification, the performance of the DenseNet model reduces to %. Similar performance reduction can be observed on the performance of the ResNet model when the adversarial images generated using the DenseNet model are used. The adversarial images generated by perturbing all the high-frequency wavelet bands reduce the classification accuracy up to %. The sensitivity of the network against the unseen attack generated models shows the practicality of the proposed attack. Other than that, when adversarial examples are generated using the ResNet on Multi-PIE [23] and used for classification by InceptionNet [38], the performance of the network reduces by %. The perturbation of low-frequency components hurts the performance most in comparison to the modification of high-frequency components. The highest reduction in accuracy across the unseen testing network is observed when both low and high-frequency components are perturbed.

    WaveTransform works by corrupting low frequency and high frequency information contained in the image. It is well understood that the low frequency information corresponds to the high level features learned in the deeper layers of the network [32, 15]. Moosavi-Dezfooli et al. [33] have shown that the high level features learned by different models tend to be similar. We assert that since the proposed method perturbs low frequency information that is used across models, it shows good transferability.


    Database
    Original Wavelet Subbands
    LL LH HL HH High All

     CIFAR-10
     Before Defense
    94.38 3.11 7.10 6.56 13.77 1.03 0.16
    After Defense
    91.92 2.42 5.73 5.03 10.05 0.65 0.11

    F-MNIST
    Before Defense
    87.88 59.04 78.51 72.73 80.56 70.04 58.36
    After Defense
    81.29  57.99  72.74  69.05  74.84  66.76 57.84

    Table 3: Classification rates (%) for the original and adversarial images generated by attacking different wavelet subbands in the presence of kernel defense [39]. ResNet-50 model is used for CIFAR-10 [28] and the results on F-MNIST [42] are reported using the custom CNN.

    Adversarial Resiliency: With the advancement in the adversarial attack domain, researchers have proposed several defense algorithms [1, 3, 16, 17, 18, 35]. We next evaluate the resiliency of the attack images generated using the proposed WaveTransform against the recently proposed defense algorithm by Wang et al. [39]1. The concept of the defense algorithm is close to the proposed attack, thus making it a perfect fit for evaluation. The defense algorithm performs smoothing of the CNN neurons at earlier layers to reduce the effect of adversarial noise.

    Table 3 summarizes the results with the defense algorithm on CIFAR10 and F-MNIST databases. Interestingly, we observe that in the presence of the defense algorithm, the performance of the network is further reduced. We hypothesize that, while the proposed attack is perturbing the frequency components, the kernel smoothing further attenuates the noise and yields a higher fooling rate. This phenomenon can also be seen from the accuracy of clean images. For example, the ResNet model without defense yields % accuracy on CIFAR-10, which reduces to % after defense incorporation. The proposed attack can fool the defense algorithm on each database. For example, on the CIFAR-10 database, the proposed attack reduces the accuracy up to % and it further reduces to % after the defense. Similar resiliency is observed on the F-MNIST database as well.

    The PGD attack, which shows a similar reduction in the performance on the CIFAR-10 database, is found less resilient against the defense proposed by Wang et al. [39]. The defense algorithm can successfully boost the recognition performance of ResNet-50 by %; whereas, the proposed attack is found to be resilient against the defense.

    The robustness of the proposed attack is also evaluated against state-of-the-art defense methods such as Madry et al. [31] and Zhang et al. [49] on the CIFAR-10 database. The defense model presented by Madry et al. [31] utilizes the ResNet-50 model, which yields % accuracy on clean images of the database, but the accuracy significantly reduces to % when the proposed attack is applied. Similarly, the accuracy of the defended WideResNet [48] model by Zhang et al. [49] reduces to % from %.

    6 Conclusion

    High and low frequency components present in an image play a vital role when they are processed by deep learning models. Several recent research works have also highlighted that CNN models are highly sensitive towards high and low-frequency components. The attack generation algorithms in the literature generally learn the additive noise vector without considering the individual frequency components. In this research, intending to understand the role of different frequencies, we have proposed a novel attack by decomposing the images using discrete wavelet transform and adding learned adversarial noise in different frequency subbands. The experiments using multiple databases and deep learning models show that the proposed attack poses a significant challenge to the classification models. The proposed attack is further evaluated under unseen network training-testing settings to showcase its real-world application. Other than that, the proposed WaveTransform attack is found to be challenging to mitigate/defend.

    Acknowledgements

    A. Agarwal was partly supported by the Visvesvaraya PhD Fellowship. R. Singh and M. Vatsa are partially supported through a research grant from MHA, India. M. Vatsa is also partially supported through Swarnajayanti Fellowship by the Government of India.

    Footnotes

    1. Original codes provided by the authors are used to perform the experiment.

    References

    1. A. Agarwal, R. Singh, M. Vatsa and N. Ratha (2018)

      Are image-agnostic universal adversarial perturbations for face recognition difficult to detect?.

      IEEE BTAS, pp. 1–7.

      Cited by: §5.3.

    2. A. Agarwal, M. Vatsa, R. Singh and N. Ratha (2020)

      Noise is inside me! generating adversarial perturbations with noise derived from natural filters.

      IEEE CVPRW.

      Cited by: §2.

    3. A. Agarwal, M. Vatsa and R. Singh (2020)

      The role of sign and direction of gradient on the performance of CNN.

      IEEE CVPRW.

      Cited by: §5.3.

    4. N. Akhtar and A. Mian (2018)

      Threat of adversarial attacks on deep learning in computer vision: a survey.

      IEEE Access 6, pp. 14410–14430.

      Cited by: §5.3.

    5. A. Athalye, L. Engstrom, A. Ilyas and K. Kwok (2018)

      Synthesizing robust adversarial examples.

      ICML, pp. 284–293.

      Cited by: §2.

    6. V. Behzadan and A. Munir (2017)

      Vulnerability of deep reinforcement learning to policy induction attacks.

      In MLDM,

      pp. 262–275.

      Cited by: §2.

    7. Q. Cao, L. Shen, W. Xie, O. M. Parkhi and A. Zisserman (2018)

      Vggface2: a dataset for recognising faces across pose and age.

      In IEEE FG,

      pp. 67–74.

      Cited by: §4.

    8. N. Carlini and D. Wagner (2017)

      Towards evaluating the robustness of neural networks.

      In IEEE S&P,

      pp. 39–57.

      Cited by: §2.

    9. N. Carlini and D. Wagner (2018)

      Audio adversarial examples: targeted attacks on speech-to-text.

      In IEEE S&PW,

      pp. 1–7.

      Cited by: §2.

    10. P. Chen, Y. Sharma, H. Zhang, J. Yi and C. Hsieh (2018)

      EAD: elastic-net attacks to deep neural networks via adversarial examples.

      In AAAI,

      Cited by: §2.

    11. J. Deng, W. Dong, R. Socher and L. Li (2009)

      ImageNet: a large-scale hierarchical image database.

      IEEE CVPR, pp. 710–719.

      Cited by: Figure 1,
      2nd item,
      §1,
      §4.

    12. S. U. Din, N. Akhtar, S. Younis, F. Shafait, A. Mansoor and M. Shafique (2020)

      Steganographic universal adversarial perturbations.

      Pattern Recognition Letters 135, pp. 146 – 152.

      Cited by: Figure 5,
      §5.2,
      §5.2,
      §5.2.

    13. Y. Dong, F. Liao, T. Pang, H. Su, J. Zhu, X. Hu and J. Li (2018)

      Boosting adversarial attacks with momentum.

      In IEEE CVPR,

      pp. 9185–9193.

      Cited by: §2.

    14. J. Gao, J. Lanchantin, M. L. Soffa and Y. Qi (2018)

      Black-box generation of adversarial text sequences to evade deep learning classifiers.

      In IEEE S&PW,

      pp. 50–56.

      Cited by: §2.

    15. R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann and W. Brendel (2019)

      ImageNet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness.

      ICLR.

      Cited by: §1,
      §5.3.

    16. A. Goel, A. Agarwal, M. Vatsa, R. Singh and N. Ratha (2019)

      DeepRing: protecting deep neural network with blockchain.

      IEEE CVPRW.

      Cited by: §5.3.

    17. A. Goel, A. Agarwal, M. Vatsa, R. Singh and N. Ratha (2019)

      Securing CNN model and biometric template using blockchain.

      IEEE BTAS, pp. 1–6.

      Cited by: §5.3.

    18. A. Goel, A. Agarwal, M. Vatsa, R. Singh and N. Ratha (2020)

      DNDNet: reconfiguring CNN for adversarial robustness.

      IEEE CVPRW.

      Cited by: §5.3.

    19. A. Goel, A. Singh, A. Agarwal, M. Vatsa and R. Singh (2018)

      Smartbox: benchmarking adversarial detection and mitigation algorithms for face recognition.

      IEEE BTAS.

      Cited by: §2.

    20. I. J. Goodfellow, J. Shlens and C. Szegedy (2014)

      Explaining and harnessing adversarial examples.

      arXiv preprint arXiv:1412.6572.

      Cited by: §1,
      §2,
      §5.2.

    21. G. Goswami, N. Ratha, A. Agarwal, R. Singh and M. Vatsa (2018)

      Unravelling robustness of deep learning based face recognition against adversarial attacks.

      AAAI, pp. 6829–6836.

      Cited by: §2.

    22. G. Goswami, A. Agarwal, N. Ratha, R. Singh and M. Vatsa (2019)

      Detecting and mitigating adversarial perturbations for robust face recognition.

      International Journal of Computer Vision 127, pp. 719–742.

      Note: doi: /url10.1007/s11263-019-01160-w

      Cited by: §2.

    23. R. Gross, I. Matthews, J. Cohn, T. Kanade and S. Baker (2010)

      Multi-pie.

      I&V Comp. 28 (5), pp. 807–813.

      Cited by: Figure 1,
      §4,
      §4,
      Figure 3,
      Figure 4,
      §5.1,
      §5.3,
      §5.3,
      Table 2.

    24. K. He, X. Zhang, S. Ren and J. Sun (2016)

      Deep residual learning for image recognition.

      In IEEE CVPR,

      pp. 770–778.

      Cited by: 3rd item,
      §4.

    25. G. Huang, Z. Liu, L. Van Der Maaten and K. Q. Weinberger (2017)

      Densely connected convolutional networks.

      In IEEE CVPR,

      pp. 4700–4708.

      Cited by: 3rd item,
      §4.

    26. J. Kiefer and J. Wolfowitz (1952)

      Stochastic estimation of the maximum of a regression function.

      The Annals of Mathematical Statistics 23 (3), pp. 462–466.

      Cited by: §4.

    27. D. P. Kingma and J. Ba (2015)

      Adam: a method for stochastic optimization.

      ICLR.

      Cited by: §4.

    28. A. Krizhevsky (2009)

      Learning multiple layers of features from tiny images.

      Cited by: Figure 1,
      2nd item,
      §4,
      Figure 6,
      §5.3,
      §5.3,
      Table 2,
      Table 3.

    29. A. Kurakin, I. Goodfellow and S. Bengio (2017)

      Adversarial examples in the physical world.

      ICLR-W.

      Cited by: §2.

    30. A. Kurakin, I. Goodfellow and S. Bengio (2017)

      Adversarial machine learning at scale.

      ICLR.

      Cited by: §2,
      §5.1.

    31. A. Madry, A. Makelov, L. Schmidt, D. Tsipras and A. Vladu (2018)

      Towards deep learning models resistant to adversarial attacks.

      ICLR.

      Cited by: §2,
      §4,
      §5.2,
      §5.3.

    32. D. Matthew and R. Fergus (2014)

      Visualizing and understanding convolutional neural networks.

      In ECCV,

      pp. 6–12.

      Cited by: §1,
      §5.3.

    33. S. Moosavi-Dezfooli, A. Fawzi, O. Fawzi and P. Frossard (2017)

      Universal adversarial perturbations.

      In IEEE CVPR,

      pp. 1765–1773.

      Cited by: §5.3.

    34. S. Moosavi-Dezfooli, A. Fawzi and P. Frossard (2016)

      Deepfool: a simple and accurate method to fool deep neural networks.

      In IEEE CVPR,

      pp. 2574–2582.

      Cited by: §2.

    35. K. Ren, T. Zheng, Z. Qin and X. Liu (2020)

      Adversarial attacks and defenses in deep learning.

      Elsevier Engineering, pp. 1–15.

      Cited by: §5.3.

    36. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg and L. Fei-Fei (2015)

      ImageNet Large Scale Visual Recognition Challenge.

      IJCV 115 (3), pp. 211–252.

      Cited by: §4.

    37. R. Singh, A. Agarwal, M. Singh, S. Nagpal and M. Vatsa (2020)

      On the robustness of face recognition algorithms against attacks and bias.

      AAAI SMT.

      Cited by: §2.

    38. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich (2015)

      Going deeper with convolutions.

      In IEEE CVPR,

      pp. 1–9.

      Cited by: §4,
      §5.3.

    39. H. Wang, X. Wu, Z. Huang and E. P. Xing (2020)

      High-frequency component helps explain the generalization of convolutional neural networks.

      In IEEE/CVF CVPR,

      pp. 8684–8694.

      Cited by: §1,
      §5.3,
      §5.3,
      Table 3,
      §5.

    40. C. Xiang, C. R. Qi and B. Li (2019)

      Generating 3d adversarial point clouds.

      In IEEE CVPR,

      pp. 9136–9144.

      Cited by: §2.

    41. C. Xiao, B. Li, J. Zhu, W. He, M. Liu and D. Song (2018)

      Generating adversarial examples with adversarial networks.

      arXiv preprint arXiv:1801.02610.

      Cited by: §2.

    42. H. Xiao, K. Rasul and R. Vollgraf (2017)

      Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.

      arXiv preprint arXiv:1708.07747.

      Cited by: Figure 1,
      Table 1,
      §4,
      §5.1,
      §5.1,
      Table 2,
      Table 3.

    43. C. Xie, J. Wang, Z. Zhang, Y. Zhou, L. Xie and A. Yuille (2017)

      Adversarial examples for semantic segmentation and object detection.

      In IEEE ICCV,

      pp. 1369–1378.

      Cited by: §2.

    44. Z. Yahya, M. Hassan, S. Younis and M. Shafique (2020)

      Probabilistic analysis of targeted attacks using transform-domain adversarial examples.

      IEEE Access 8 (), pp. 33855–33869.

      Cited by: §5.2.

    45. L. Yao and J. Miller (2015)

      Tiny imagenet classification with convolutional neural networks.

      CS 231N 2 (5), pp. 8.

      Cited by: Figure 1,
      2nd item,
      §4,
      Figure 3,
      Figure 4,
      §5.1,
      §5.1,
      Table 2.

    46. D. Yi, Z. Lei, S. Liao and S. Z. Li (2014)

      Learning face representation from scratch.

      arXiv preprint arXiv:1411.7923.

      Cited by: §4.

    47. X. Yuan, P. He, Q. Zhu and X. Li (2019)

      Adversarial examples: attacks and defenses for deep learning.

      IEEE TNNLS 30 (9), pp. 2805–2824.

      Cited by: §2.

    48. S. Zagoruyko and N. Komodakis (2016)

      Wide residual networks.

      arXiv preprint arXiv:1605.07146.

      Cited by: §5.3.

    49. H. Zhang, Y. Yu, J. Jiao, E. P. Xing, L. E. Ghaoui and M. I. Jordan (2019)

      Theoretically principled trade-off between robustness and accuracy.

      ICML.

      Cited by: §5.3.

    50. Zhou Wang and A. C. Bovik (2002)

      A universal image quality index.

      IEEE Signal Processing Letters 9 (3), pp. 81–84.

      Cited by: §5.2.

    https://www.groundai.com/project/wavetransform-crafting-adversarial-examples-via-input-decomposition/


    CSIT FUN , 版权所有丨如未注明 , 均为原创丨本网站采用BY-NC-SA协议进行授权
    转载请注明原文链接:WaveTransform: Crafting Adversarial Examples via Input Decomposition
    喜欢 (0)
    [985016145@qq.com]
    分享 (0)
    发表我的评论
    取消评论
    表情 贴图 加粗 删除线 居中 斜体 签到

    Hi,您需要填写昵称和邮箱!

    • 昵称 (必填)
    • 邮箱 (必填)
    • 网址