Detecting Adversarial Examples on Deep Neural Networks with Mutual Information Neural Estimation

Gao, S; Wang, R; Wang, X; Yu, S; Dong, Y; Yao, S; Zhou, W

Detecting Adversarial Examples on Deep Neural Networks with Mutual Information Neural Estimation

Gao, S Wang, R Wang, X Yu, S

Dong, Y Yao, S Zhou, W

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Dependable and Secure Computing, 2023, 20, (6), pp. 5168-5181
Issue Date:: 2023-11-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (5.11 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Gao, S
dc.contributor.author	Wang, R
dc.contributor.author	Wang, X
dc.contributor.author	Yu, S https://orcid.org/0000-0003-4485-6743
dc.contributor.author	Dong, Y
dc.contributor.author	Yao, S
dc.contributor.author	Zhou, W
dc.date.accessioned	2024-03-12T01:02:15Z
dc.date.available	2024-03-12T01:02:15Z
dc.date.issued	2023-11-01
dc.identifier.citation	IEEE Transactions on Dependable and Secure Computing, 2023, 20, (6), pp. 5168-5181
dc.identifier.issn	1545-5971
dc.identifier.issn	1941-0018
dc.identifier.uri	http://hdl.handle.net/10453/176524
dc.description.abstract	Despite achieving exceptional performance, deep neural networks (DNNs) suffer from the harassment caused by adversarial examples, which are produced by corrupting clean examples with tiny perturbations. Many powerful defense methods have been presented such as training data augmentation and input reconstruction which, however, usually rely on the prior knowledge of the targeted models or attacks. In this paper, we propose a novel approach for detecting adversarial images, which can protect any pre-trained DNN classifiers and resist an endless stream of new attacks. Specifically, we first adopt a dual autoencoder to project images to a latent space. The dual autoencoder uses the self-supervised learning to ensure that small modifications to samples do not significantly alter their latent representations. Next, the mutual information neural estimation is utilized to enhance the discrimination of the latent representations. We then leverage the prior distribution matching to regularize the latent representations. To easily compare the representations of examples in the two spaces, and not rely on the prior knowledge of the targeted model, a simple fully connected neural network is used to embed the learned representations into an eigenspace, which is consistent with the output eigenspace of the targeted model. Through the distribution similarity of an input example in the two eigenspaces, we can judge whether the input example is adversarial or not. Extensive experiments on MNIST, CIFAR-10, and ImageNet show that the proposed method has superior defense performance and transferability than state-of-the-arts.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Transactions on Dependable and Secure Computing
dc.relation.isbasedon	10.1109/TDSC.2023.3241428
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	0803 Computer Software, 0804 Data Format, 0805 Distributed Computing
dc.subject.classification	Strategic, Defence & Security Studies
dc.subject.classification	4604 Cybersecurity and privacy
dc.subject.classification	4606 Distributed computing and systems software
dc.title	Detecting Adversarial Examples on Deep Neural Networks with Mutual Information Neural Estimation
dc.type	Journal Article
utslib.citation.volume	20
utslib.for	0803 Computer Software
utslib.for	0804 Data Format
utslib.for	0805 Distributed Computing
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	University of Technology Sydney/Strength - CCSP - Centre for Cyber Security and Privacy
utslib.copyright.status	open_access	*
dc.date.updated	2024-03-12T01:02:14Z
pubs.issue	6
pubs.publication-status	Published
pubs.volume	20
utslib.citation.issue	6

Abstract:

Despite achieving exceptional performance, deep neural networks (DNNs) suffer from the harassment caused by adversarial examples, which are produced by corrupting clean examples with tiny perturbations. Many powerful defense methods have been presented such as training data augmentation and input reconstruction which, however, usually rely on the prior knowledge of the targeted models or attacks. In this paper, we propose a novel approach for detecting adversarial images, which can protect any pre-trained DNN classifiers and resist an endless stream of new attacks. Specifically, we first adopt a dual autoencoder to project images to a latent space. The dual autoencoder uses the self-supervised learning to ensure that small modifications to samples do not significantly alter their latent representations. Next, the mutual information neural estimation is utilized to enhance the discrimination of the latent representations. We then leverage the prior distribution matching to regularize the latent representations. To easily compare the representations of examples in the two spaces, and not rely on the prior knowledge of the targeted model, a simple fully connected neural network is used to embed the learned representations into an eigenspace, which is consistent with the output eigenspace of the targeted model. Through the distribution similarity of an input example in the two eigenspaces, we can judge whether the input example is adversarial or not. Extensive experiments on MNIST, CIFAR-10, and ImageNet show that the proposed method has superior defense performance and transferability than state-of-the-arts.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/176524