Label-Only Model Inversion Attacks: Attack With the Least Information

Zhu, T; Ye, D; Zhou, S; Liu, B; Zhou, W

Label-Only Model Inversion Attacks: Attack With the Least Information

Zhu, T Ye, D

Zhou, S Liu, B

Zhou, W

Permalink

Publisher:: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:: Journal Article
Citation:: IEEE Transactions on Information Forensics and Security, 2023, 18, pp. 991-1005
Issue Date:: 2023-01-01

Embargoed

	Filename	Description	Size
	Label-Only Model Inversion Attacks Attack With the Least Information.pdf	Accepted version	2.91 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Embargoed
Open Access

This item is currently unavailable due to the publisher's embargo.

The embargo period expires on 1 Jan 2025

Full metadata record

Field	Value	Language
dc.contributor.author	Zhu, T
dc.contributor.author	Ye, D https://orcid.org/0000-0002-7561-0992
dc.contributor.author	Zhou, S
dc.contributor.author	Liu, B https://orcid.org/0000-0002-3603-6617
dc.contributor.author	Zhou, W
dc.date.accessioned	2024-02-06T05:42:39Z
dc.date.available	2024-02-06T05:42:39Z
dc.date.issued	2023-01-01
dc.identifier.citation	IEEE Transactions on Information Forensics and Security, 2023, 18, pp. 991-1005
dc.identifier.issn	1556-6013
dc.identifier.issn	1556-6021
dc.identifier.uri	http://hdl.handle.net/10453/175368
dc.description.abstract	In a model inversion attack, an adversary attempts to reconstruct the training data records of a target model using only the model's output. In launching a contemporary model inversion attack, the strategies discussed are generally based on either predicted confidence score vectors, i.e., black-box attacks, or the parameters of a target model, i.e., white-box attacks. However, in the real world, model owners usually only give out the predicted labels; the confidence score vectors and model parameters are hidden as a defense mechanism to prevent such attacks. Unfortunately, we have found a model inversion method that can reconstruct representative samples of the target model's training data based only on the output labels. We believe this attack requires the least information to succeed and, therefore, has the best applicability. The key idea is to exploit the error rate of the target model to compute the median distance from a set of data records to the decision boundary of the target model. The distance is then used to generate confidence score vectors which are adopted to train an attack model to reconstruct the representative samples. The experimental results show that highly recognizable representative samples can be reconstructed with far less information than existing methods.
dc.language	English
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
dc.relation	http://purl.org/au-research/grants/arc/ARC LP180101150
dc.relation.ispartof	IEEE Transactions on Information Forensics and Security
dc.relation.isbasedon	10.1109/TIFS.2022.3233190
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering
dc.subject.classification	Strategic, Defence & Security Studies
dc.subject.classification	40 Engineering
dc.subject.classification	46 Information and computing sciences
dc.title	Label-Only Model Inversion Attacks: Attack With the Least Information
dc.type	Journal Article
utslib.citation.volume	18
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	University of Technology Sydney/Strength - CCSP - Centre for Cyber Security and Privacy
utslib.copyright.status	embargoed	*
utslib.copyright.embargo	2025-01-01T00:00:00+1000Z
dc.date.updated	2024-02-06T05:42:37Z
pubs.publication-status	Published
pubs.volume	18

Abstract:

In a model inversion attack, an adversary attempts to reconstruct the training data records of a target model using only the model's output. In launching a contemporary model inversion attack, the strategies discussed are generally based on either predicted confidence score vectors, i.e., black-box attacks, or the parameters of a target model, i.e., white-box attacks. However, in the real world, model owners usually only give out the predicted labels; the confidence score vectors and model parameters are hidden as a defense mechanism to prevent such attacks. Unfortunately, we have found a model inversion method that can reconstruct representative samples of the target model's training data based only on the output labels. We believe this attack requires the least information to succeed and, therefore, has the best applicability. The key idea is to exploit the error rate of the target model to compute the median distance from a set of data records to the decision boundary of the target model. The distance is then used to generate confidence score vectors which are adopted to train an attack model to reconstruct the representative samples. The experimental results show that highly recognizable representative samples can be reconstructed with far less information than existing methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/175368