Sparse Feature Attacks in Adversarial Learning

Publication Type:
Journal Article
Citation:
IEEE Transactions on Knowledge and Data Engineering, 2018, 30 (6), pp. 1164 - 1177
Issue Date:
2018-06-01
Filename Description Size
08249883.pdfAccepted Manuscript Version1.44 MB
Adobe PDF
Full metadata record
© 2018 IEEE. Adversarial learning is the study of machine learning techniques deployed in non-benign environments. Example applications include classification for detecting spam, network intrusion detection, and credit card scoring. In fact, as the use of machine learning grows in diverse application domains, the possibility for adversarial behavior is likely to increase. When adversarial learning is modelled in a game-theoretic setup, the standard assumption about the adversary (player) behavior is the ability to change all features of the classifiers (the opponent player) at will. The adversary pays a cost proportional to the size of the 'attack'. We refer to this form of adversarial behavior as a dense feature attack. However, the aim of an adversary is not just to subvert a classifier but carry out data transformation in a way such that spam continues to remain effective. We demonstrate that an adversary could potentially achieve this objective by carrying out a sparse feature attack. We design an algorithm to show how a classifier should be designed to be robust against sparse adversarial attacks. Our main insight is that sparse feature attacks are best defended by designing classifiers which use ℓ1 regularizers.
Please use this identifier to cite or link to this item: