Detecting Adversarial Examples on Deep Neural Networks with Mutual Information Neural Estimation

Publisher:
Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:
Journal Article
Citation:
IEEE Transactions on Dependable and Secure Computing, 2023, 20, (6), pp. 5168-5181
Issue Date:
2023-11-01
Full metadata record
Despite achieving exceptional performance, deep neural networks (DNNs) suffer from the harassment caused by adversarial examples, which are produced by corrupting clean examples with tiny perturbations. Many powerful defense methods have been presented such as training data augmentation and input reconstruction which, however, usually rely on the prior knowledge of the targeted models or attacks. In this paper, we propose a novel approach for detecting adversarial images, which can protect any pre-trained DNN classifiers and resist an endless stream of new attacks. Specifically, we first adopt a dual autoencoder to project images to a latent space. The dual autoencoder uses the self-supervised learning to ensure that small modifications to samples do not significantly alter their latent representations. Next, the mutual information neural estimation is utilized to enhance the discrimination of the latent representations. We then leverage the prior distribution matching to regularize the latent representations. To easily compare the representations of examples in the two spaces, and not rely on the prior knowledge of the targeted model, a simple fully connected neural network is used to embed the learned representations into an eigenspace, which is consistent with the output eigenspace of the targeted model. Through the distribution similarity of an input example in the two eigenspaces, we can judge whether the input example is adversarial or not. Extensive experiments on MNIST, CIFAR-10, and ImageNet show that the proposed method has superior defense performance and transferability than state-of-the-arts.
Please use this identifier to cite or link to this item: