Advances in machine learning have led to broad deployment of systems with
impressive performance on important problems. Nonetheless, these systems can be
induced to make errors on data that are surprisingly similar to examples the
learned system handles correctly. The existence of these errors raises a
variety of questions about out-of-sample generalization and whether bad actors
might use such examples to abuse deployed systems. As a result of these
security concerns, there has been a flurry of recent papers proposing
algorithms to defend against such malicious perturbations of correctly handled
examples. It is unclear how such misclassifications represent a different kind
of security problem than other errors, or even other attacker-produced examples
that have no specific relationship to an uncorrupted input. In this paper, we
argue that adversarial example defense papers have, to date, mostly considered
abstract, toy games that do not relate to any specific security concern.
Furthermore, defense papers have not yet precisely described all the abilities
and limitations of attackers that would be relevant in practical security.
Towards this end, we establish a taxonomy of motivations, constraints, and
abilities for more plausible adversaries. Finally, we provide a series of
recommendations outlining a path forward for future work to more clearly
articulate the threat model and perform more meaningful evaluation.
12
0
Motivating the Rules of the Game for Adversarial Example Research
attributed to: Justin Gilmer, Ryan P. Adams, Ian Goodfellow, David Andersen, George E. Dahl
Advances in machine learning have led to broad deployment of systems with
impressive performance on important problems. Nonetheless, these systems can be
induced to make errors on data that are surprisingly similar to examples the
learned system handles correctly. The existence of these errors raises a
variety of questions about out-of-sample generalization and whether bad actors
might use such examples to abuse deployed systems. As a result of these
security concerns, there has been a flurry of recent papers proposing
algorithms to defend against such malicious perturbations of correctly handled
examples. It is unclear how such misclassifications represent a different kind
of security problem than other errors, or even other attacker-produced examples
that have no specific relationship to an uncorrupted input. In this paper, we
argue that adversarial example defense papers have, to date, mostly considered
abstract, toy games that do not relate to any specific security concern.
Furthermore, defense papers have not yet precisely described all the abilities
and limitations of attackers that would be relevant in practical security.
0
Vulnerabilities & Strengths