#ACL2021NLP#ACL2021 Please check our group’s recent publication at the main conference of @aclmeeting. We uncovered a compositional generalization problem existing in NMT models and contributed a new dataset. Contributed by Yafu Li, Yongjing Yin, Yulong Chen, Yue Zhang.
Prof Yue Zhang leads the #NLP lab at Westlake University @Westlake_Uni. Our group focuses on machine learning-based natural language processing, as well as application-oriented tasks, such as web information extraction and financial market prediction. Welcome to join us!
Department of Artificial Intelligence, Yonsei University, South KoreaDepartment of Mechanical Engineering, Yonsei University, South KoreaSchool of Integrated Technology, Yonsei University, South Korea
A significant amount of work has been done on adversarial attacks that inject imperceptible noise to images to deteriorate the image classification performance of deep models. However, most of the existing studies consider attacks in the digital (pixel) domain where an image acquired by an image sensor with sampling and quantization has been recorded. This paper, for the first time, introduces an optical adversarial attack, which physically alters the light field information arriving at the image sensor so that the classification model yields misclassification. More specifically, we modulate the phase of the light in the Fourier domain using a spatial light modulator placed in the photographic system. The operative parameters of the modulator are obtained by gradient-based optimization to maximize cross-entropy and minimize distortions. We present experiments based on both simulation and a real hardware optical system, from which the feasibility of the proposed optical attack is demonstrated. It is also verified that the proposed attack is completely different from common optical-domain distortions such as spherical aberration, defocus, and astigmatism in terms of both perturbation patterns and classification results.
It is well known that injecting small perturbations to input data can significantly degrade the performance of deep neural networks, called adversarial attacks.
Because such attacks raise security concerns of deep learning-based applications, many researchers have studied the impact of the adversarial attacks on various deep models, especially for image classification models Goodfellow et al. (2015); Su et al. (2018).
Most existing studies focus on finding adversarial examples in the digital domain, i.e., altering the pixel values of digital images.
Another possibility is that an attack is applied to the target object in the physical domain.
For this, a few studies demonstrate the efficacy of adversarial examples found in the digital domain when they are implemented in the physical domain, e.g., printed objects Kurakin et al. (2017b); Athalye et al. (2018).
Such applicability of adversarial examples on real objects raises more severe security concerns in various practical applications (e.g., autonomous vehicle system Nassi et al. (2019), person detector Xu et al. (2020)).
Orthogonal to these attempts, this paper introduces an optical adversarial attack by considering a new layer between real objects and digital images for implementing adversarial attacks, i.e., the optical system acquiring the light field information from the target object in the physical world and converting it to the image in the digital domain.
The idea is to modulate the phase of the light information in the Fourier domain using a device called spatial light modulator (SLM).
Spatially varying phase modulations are found by optimizing an objective function to minimize image distortion and maximize the cross-entropy, which are realized by the SLM in the optical system.
The change in the digital image obtained by the image sensor due to the phase modulations is hardly perceptible, but can significantly deteriorate the performance of the image classification model.
The main contribution of our work can be summarized as follows.
We propose an optical adversarial attack that is implemented in the optical system, which deteriorates the performance of the deep models performing classification using the images acquired from the optical system (Section 3).
We show the feasibility of our optical adversarial attack by conducting experiments on a simulated optical system for various images from the ImageNet dataset (Section 4).
It is shown that the attacked optical system produces output images that have similar quality as the original outputs but fool the subsequent image classification models.
We conduct real experiments on an actual system implementing our attack to demonstrate the feasibility of the proposed idea in the real world (Section 5).
Our attack is also compared to common optical-domain phase distortions such as spherical aberration, defocus, and astigmatism, which verifies the significant superiority of our method as an attack.
Our work has two important implications.
First, our work is the first to demonstrate the possibility of implementing adversarial attacks by altering the light information in the optical system.
The work in Li et al. (2019) proposes a physical attack by putting a sticker on the camera lens, but the attack occurs outside the optical system and, furthermore, physical intervention (i.e., putting a sticker) is required.
In contrast, our attack takes place inside the optical system, and is implemented without physical intervention by maliciously controlling the computer used as the controller of the SLM.
Second, we raise a new immediate vulnerability issue of practical systems where SLMs are employed, including biomedical imaging, holography, and optical encryption.
In such systems, our work shows that malicious attempts may be made not only by conventional attacks in the digital domain but also by optical attacks in the physical domain.
2 Related work
2.1 Adversarial attack
Various adversarial attack methods against image classification models have been developed.
Goodfellow et al. Goodfellow et al. (2015) proposed the fast gradient sign method (FGSM) that obtains a perturbation for a given image from the sign of the gradients of a target image classification model.
Kurakin et al. Kurakin et al. (2017a) extended FGSM to an iterative approach to find a more powerful perturbation, which is called I-FGSM.
Carlini and Wagner Carlini and Wagner (2017) developed an efficient attack method that finds a perturbation by minimizing the amount of deterioration and the distance of logits between the original predicted classlabel and the targetlabel.
While the aforementioned methods focus on injecting a perturbation into a given digital image that will be directly inputted to a target image classification model, some researchers have also investigated adversarial examples that are applicable to physical objects.
Kurakin et al. Kurakin et al. (2017b) demonstrated the feasibility of finding adversarial examples that can fool the classification model even when the attacked images are printed and captured again using a phone camera.
Eykholt et al. Eykholt et al. (2019) showed that physically perturbing real objects such as road signs can attack image classification models.
Athalye et al. Athalye et al. (2018) further provided adversarial showcases with 3D-printed objects that can make the classification model misclassify the images taken in various viewpoints.
Previous research has focused on attacking images or objects themselves, and to the best of our knowledge, there is no approach that attacks optical systems acquiring images from real objects.
2.2 SLM-based optical system
An SLM is an computer-controlled active device used to modulate the amplitude, phase, or polarization of light waves in space and time.
Among several types of SLMs, liquid crystal on silicon (LCoS) SLMs are used in applications that call for phase modulations in optical systems such as lithography Jenness et al. (2008, 2010); Lowell et al. (2017), optical tweezer Reicherter et al. (1999); Kim et al. (2016); Hadad et al. (2018), turbulence simulation Burger et al. (2008); Phillips et al. (2005), and imaging Quirin et al. (2013); Wang et al. (2011); Situ et al. (2010); Jesacher et al. (2007); Warber et al. (2010); Mukherjee et al. (2019).
In Fourier optics, a lens is regarded as a Fourier transform engine.
That is, for a given object field in the front focal plane of the lens, its Fourier transform can be obtained in the back focal plane of the lens.
This plane is referred to as the Fourier plane, where one has access to the spatial frequency spectrum of the object field.
By placing an SLM in the Fourier plane, one can alter phase delay individually for each spatial frequency component, thus modifying the transfer function or image formation of an optical imaging system.
For example, it has been shown that the depth-of-field of optical imaging systems can be increased significantly by introducing cubic phase offset in the Fourier plane Quirin et al. (2013).
In addition, the phase modulation technology using SLMs has been used in phase imaging of thin biological specimens Wang et al. (2011); Situ et al. (2010) and aberration correction of optical systems Jesacher et al. (2007); Warber et al. (2010).
Various applications of SLMs for the pupil engineering can be referred to in the review paper Maurer et al. (2011).
A recent study by Kravets et al. Kravets et al. (2021) introduced a defense technique using an SLM to defend adversarial attacks applied in the digital domain.
On the other hand, we consider an optical adversarial attack, which is implemented using a phase SLM.
3 Proposed system
We set up an SLM-based optical system that consists of a camera lens, an SLM, and an image sensor, which is illustrated in Figure 1.
The image acquisition process is as follows.
First, a commercial camera lens (SP AF 60mm F/2 Di II Macro 1:1, Tamron) collects the object field and then generates the image at the intermediate image plane.
In order to achieve direct access to the Fourier plane, we construct a 4-f system using two lenses (RL, AC508-100-A, Thorlabs) to relay the information onto the image plane.
A phase-only SLM (HSP512, Meadowlark) is placed in the Fourier plane, i.e., the back focal plane of the first relay lens.
Since the SLM is polarization-dependent, a linear polarizer (LPVISC100, Thorlabs) is placed before the SLM.
The phase-modulated light via the SLM is then reflected by the beam splitter (BS031, Thorlabs), and the image in the image plane is captured by an image sensor (Flare 4M180-CL, IO Industries).
An obtained digital image is then inputted to a deep neural network that classifies an object in the image.
In this study, we consider three widely known models, namely, ResNet50 He et al. (2016), VGG16 Simonyan and Zisserman (2015), and MobileNetV3 Howard et al. (2019), which are pre-trained on the ImageNet dataset Russakovsky et al. (2015).
On this system architecture, our optical adversarial attack aims to find an adversarial perturbation that is displayed as an SLM pattern, which leads the classifier to misclassify the resulting image, while no significant visible differences are observed between the unattacked and attacked images.
3.1 Imaging model of SLM-based optical adversarial attack
Let Xobj be the intensity of an object.
The camera lens forms image Ximg at the intermediate image plane, and then the 4-f system relays this information onto the image plane (see Figure 1).
For an incoherent imaging system, the acquired image X can be expressed as Goodman (2005)
where iPSF denotes the incoherent point spread function and ⊗ represents 2D convolution. iPSF is equal to the squared magnitude of the coherent point spread function h (i.e., iPSF=|h|2).
Note that h is the Fourier transform of the pupil function H.
In the Fourier domain, (1) can be written as
where MTF is the modulation transfer function, and ~Ximg is the Fourier transform of Ximg.
Using convolution theorem, MTF can be obtained as MTF=F(|h|2)=|H⋆H|, where F(⋅) is the Fourier transform operator and ⋆ represents 2D correlation.
We consider a circular aperture in the pupil plane with a radius of R.
If a phase modulation is applied in the pupil plane, the corresponding pupil function can be expressed as
where →u is the spatial frequency coordinate at the Fourier plane and ϕ(→u) is the modulated phase distribution, which is applied through the SLM in our case.
3.2 Finding adversarial perturbation
In our attack, non-targeted adversarial phase perturbation ϕ(→u) is found by a gradient-based l2-norm optimization method to maximize the classification loss while minimizing image distortions.
Let ^Xϕ denote the attacked version of X with phase modulation ϕ.
The optimization problem to find ^Xϕ is written as
where λ is a balancing constant between the two terms, l is the classification loss (i.e., cross-entropy), y is the ground truth classlabel, and f is the classification model.
To find an appropriate value of λ, we adopt an iterative approach that starts with a large value (to ensure a small amount of image distortion) and gradually decreases it until the classification result becomes incorrect.
4 Simulation experiments
Before we apply our adversarial attack on a real system composed of physical devices, we first conduct experiments on a simulation environment using the forward model explained in Section 3.1.
This enables us to find out the feasibility of our proposed adversarial attack method by employing a relatively larger number of images containing diverse objects.
4.1 Implementation details
We employ 1,000 test images of the NeurIPS 2017 Adversarial Attacks and Defences Competition Kurakin et al. (2018)111We obtained the images from https://kaggle.com/c/6864..
This dataset contains images associated with each of the 1,000 ImageNet classes, which are not included in the training images of the original ImageNet dataset.
The classification accuracy is used as the primary evaluation metric.
In addition, we employ the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) to measure the amount of deterioration in the attacked images compared to their corresponding unattacked images.
To find an adversarial example^Xϕ for a given image X by (4), we employ the Adam optimizer Kingma and Ba (2014) because it is known to be effective in quickly finding adversarial examples C++arlini and Wagner (2017).
We use a learning rate of 5×10−3 and a weight decay factor of 5×10−6.
We initially set the value λ to 10−2 and reduce it by 1/10 if a valid ^Xϕ is not found within the maximum number of iterations, which is set to 150.
The optimization process stops once we obtain a valid ^Xϕ that makes the model misclassify the attacked image.
We observe that both accuracy and PSNR tend to converge to certain values as λ decreases.
Figure 2(a) shows such a tendency of convergence for VGG16.
When λ becomes 10−4, the accuracy and PSNR are measured as 0.270 and 33.60 dB, respectively, and both do not significantly change when λ decreases further.
Therefore, we set the minimum value of λ as 10−6.
Figure 2(b) depicts a showcase of the obtained images for different values of λ.
The three cases do not show significant perceptual differences, while the classification result becomes wrong when λ becomes 10−4 and a larger amount of phase modulation is applied.
Table 1: Performance comparison in terms of accuracy, PSNR, and SSIM evaluated on different image classification models. Standard deviations across images are also shown. Note that relatively large standard deviations of PSNR are due to the images with failed attack despite severe phase perturbations and the images with little changes despite the attack.
Table 1 shows the performance comparison on the three classification models.
When the attack method is not employed, all the models achieve classification accuracy above 0.840.
However, when our adversarial attack is employed, the accuracy values are significantly reduced.
This result proves that the optical system for the image classification task is highly vulnerable to our proposed adversarial attack.
In addition, both the PSNR and SSIM values of the images obtained from the attacked optical system are significantly high (i.e., above 30 dB).
It implies that differences between the original and attacked images are hardly noticeable.
As a baseline, we test a so-called “random phase attack” by constructing a random phase pattern ϕ that generates a digital image having a similar PSNR value to that of an image obtained from our adversarial attack.
With this method, we obtain images having an average PSNR value of 33.12 dB, which is similar to the average PSNR values in Table 1 and even slightly lower than those obtained from our attack for ResNet50 and MobileNetV3.
However, the classification accuracy barely drops when those images are inputted to the models, which are 0.894, 0.828, and 0.860 for ResNet50, VGG16, and MobileNetV3, respectively.
This result shows that the perturbations found by our proposed attack method are very different from random perturbations and our method successfully deteriorates the classification performance while preserving the quality of the obtained images.
Diff. (PSNR, SSIM)
Figure 3: Visual showcases of the unattacked and attacked images for the three image classification models. Classified labels and their confidence levels are also reported. The third column shows the absolute pixel value differences between the unattacked and attacked images, which are magnified 10 times for better visualization. The last column shows the modulated phase in the Fourier domain.
Figure 3 shows example images with and without the adversarial attack.
The absolute differences of the unattacked and attacked images in the digital domain and the optimized phase modulation patterns (ϕ in Section 3.1) are also shown.
It can be seen that differences between the original and attacked images are not significant, which is also shown as high PSNR and SSIM values.
However, the classification models misclassify all the attacked images.
Here, the classified labels differ depending on the employed models.
For instance, the starfish image is misclassified as honeycomb, flatworm, and mask by each model, respectively.
The pixel-domain changes also differ depending on the employed models.
For example, the differences are mostly on the red channel for ResNet50, while those are mostly on the green channel for MobileNetV3.
The amount of distortion is also model-dependent, i.e., the PSNR and SSIM values differ depending on the targetclassification model for the same image.
For example, the PSNR values of the cabbage butterfly image for VGG16 and MobileNetV3 are 29.86 dB and 37.33 dB, respectively.
These model-dependent characteristics of the perturbations can be also found from low transferability of the attacked images between the models as shown in Table 2.
Table 2: Transferability of attacked images for different models in terms of accuracy
However, we also observe the following characteristics of the phase modulation patterns for different images and classifiers.
First, a wider range of phase modulations tends to yield a more distorted image having a lower PSNR value.
For instance, for ResNet50, the phase patterns of both the starfish and airliner images contain larger values (appearing as more red and blue colors) than that of the cabbage butterfly image.
Second, the phase patterns of the same image appear similar to some extent across different models.
For example, the phases of starfish show wave-like patterns, while those of cabbage butterfly contain more grain-like textures.
The overall patterns of the pixel value changes are largely different from those obtained from many existing adversarial attacks in the pixel domain Goodfellow et al. (2015); Kurakin et al. (2017a); C++arlini and Wagner (2017).
The former preserves textures of the original images, whereas the latter is typically similar to random noise and barely preserves the original textures.
It is because our adversarial attack method manipulates the imaging system in the phase domain instead of the pixel domain.
5 Real experiments
We physically implement our proposed adversarial attack with an optical system as explained in Section 3 in order to demonstrate the vulnerability of real optical systems in the wild to the proposed optical attack.
5.1 Implementation details
In the real experiments, we place “actual” objects in front of the optical system to capture and acquire images of the objects.
Considering this practical constraint, we use ten real objects that correspond to ten ImageNet classes to obtain images of those objects in the digital domain, which are bath towel, computer keyboard, lighter, paintbrush, ping-pong ball, plate rack, ruler, screwdriver, syringe, and toilet tissue.
We place an object 100-120 cm away from the camera lens, which is a distance with a field-of-view of about 250×250 mm.
Phase modulation is performed using the SLM with a resolution of 224×224 pixels and a pixel size of 30 μm.
We employ the pre-trained ResNet50 and VGG16 models.
The MobileNetV3 model is excluded here due to its relatively poor performance on the actual objects.
In addition to our attack method, we also investigate the impact of other optical-domain distortions that are usually found in real optical systems.
In this study, we consider spherical aberration, defocus, and astigmatism.
The amounts of these distortions are determined in a way that the resulting images have similar SSIM values to those perturbed by our attack.
Table 3: Performance comparison in terms of accuracy, PSNR, and SSIM for the original images, attacked images in simulation, attacked images in the real system, and images with optical distortions in the real system. Standard deviations across images are also shown.
Table 3 shows the overall performance comparison of our attack method and the three optical distortions for different image classification models, where the accuracy, PSNR, and SSIM values are reported.
We also report the accuracy of the original images and the attacked images obtained from our simulation environment explained in Section 4.
Both ResNet50 and VGG16 successfully classify the ten real objects when no distortion is involved.
However, when our attack method is employed, all the objects are classified incorrectly for both models in both simulation and real environments.
Furthermore, all optical-domain distortions do not affect much the classification performance unlike our attack method; all ten objects are still classified correctly for ResNet50 and nine objects for VGG16.
These results demonstrate that the real optical system is highly vulnerable to the proposed optical adversarial attack.
Figure 4: Visual showcases of the original and attacked images for ResNet50. Classified labels and their confidence levels are also reported.
Figure 4 shows four visual showcases of our attack and the optical distortions for ResNet50.
When the original and attacked outputs are compared in the digital domain, there are no obvious visual differences between them, which also appears as high PSNR in Table 3.
However, our attack method successfully fools the target image classification model.
For example, the original image #4 is correctly classified as paintbrush.
However, the attacked ones are misclassified as mortar in the simulation and real environments.
The optical distortions hardly affect the classification performance, reducing the confidence levels only slightly (e.g., from 30.7% to 26.6% by spherical aberration) without resulting in misclassification.
The patterns of the phase images obtained from our attack method and those of the optical distortions also show significant differences.
First, the range of the phase values for our attack is significantly smaller that those for the optical distortions: about [−0.5,0.5] (rad) vs. [−3,3] (rad).
In addition, the phase patterns are highly distinguishable across different objects for our attack method, while they are not for the optical distortions.
We presented the feasibility of attacking optical systems in the optical domain instead of attacking images in the digital domain by introducing an optical adversarial attack.
For a given real object, our attack method finds a spatially varying phase modulation pattern implemented by an SLM in order to minimize the amount of distortion in the digital domain but significantly degrade the performance of image classification models.
We conducted experiments not only in a simulation environment to evaluate with a large amount of data but also in a real optical system to evaluate the proposed attack method in the wild.
The results showed that the optical systems are highly vulnerable to our adversarial attack method, raising a new significant security issue of imaging systems.
Our current work has the following limitations that call for future work.
First, our study can be expanded to a broader range of application fields.
Although we considered the image classification task as the main target of our experiments, our proposed adversarial attack method can be further applied to other fields that employ optical imaging systems to acquire digital images and deep neural networks to perform classification or enhancement, such as microscopic image enhancement Rivenson et al. (2017) and hologram classification Kim et al. (2018).
Second, we focused on investigating the vulnerability of the physical imaging system in this study, and to this end, we proposed the optical adversarial attack.
One of the important next directions will be to find ways to protect optical systems against adversarial attacks applied in both the optical domain and the digital domain.
 A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok (2018)