Label-Only Model Inversion Attacks: Attack With the Least Information

Publisher:
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Publication Type:
Journal Article
Citation:
IEEE Transactions on Information Forensics and Security, 2023, 18, pp. 991-1005
Issue Date:
2023-01-01
Filename Description Size
Label-Only Model Inversion Attacks Attack With the Least Information.pdfAccepted version2.91 MB
Adobe PDF
Full metadata record
In a model inversion attack, an adversary attempts to reconstruct the training data records of a target model using only the model's output. In launching a contemporary model inversion attack, the strategies discussed are generally based on either predicted confidence score vectors, i.e., black-box attacks, or the parameters of a target model, i.e., white-box attacks. However, in the real world, model owners usually only give out the predicted labels; the confidence score vectors and model parameters are hidden as a defense mechanism to prevent such attacks. Unfortunately, we have found a model inversion method that can reconstruct representative samples of the target model's training data based only on the output labels. We believe this attack requires the least information to succeed and, therefore, has the best applicability. The key idea is to exploit the error rate of the target model to compute the median distance from a set of data records to the decision boundary of the target model. The distance is then used to generate confidence score vectors which are adopted to train an attack model to reconstruct the representative samples. The experimental results show that highly recognizable representative samples can be reconstructed with far less information than existing methods.
Please use this identifier to cite or link to this item: