Adversarial Clean Label Backdoor Attacks and Defenses on Text Classification Systems

Ashim Gupta, Amrith Krishna


Abstract
Clean-label (CL) attack is a form of data poisoning attack where an adversary modifies only the textual input of the training data, without requiring access to the labeling function. CL attacks are relatively unexplored in NLP, as compared to label flipping (LF) attacks, where the latter additionally requires access to the labeling function as well. While CL attacks are more resilient to data sanitization and manual relabeling methods than LF attacks, they often demand as high as ten times the poisoning budget than LF attacks. In this work, we first introduce an Adversarial Clean Label attack which can adversarially perturb in-class training examples for poisoning the training set. We then show that an adversary can significantly bring down the data requirements for a CL attack, using the aforementioned approach, to as low as 20 % of the data otherwise required. We then systematically benchmark and analyze a number of defense methods, for both LF and CL attacks, some previously employed solely for LF attacks in the textual domain and others adapted from computer vision. We find that text-specific defenses greatly vary in their effectiveness depending on their properties.
Anthology ID:
2023.repl4nlp-1.1
Volume:
Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Burcu Can, Maximilian Mozes, Samuel Cahyawijaya, Naomi Saphra, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Chen Zhao, Isabelle Augenstein, Anna Rogers, Kyunghyun Cho, Edward Grefenstette, Lena Voita
Venue:
RepL4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–12
Language:
URL:
https://aclanthology.org/2023.repl4nlp-1.1
DOI:
10.18653/v1/2023.repl4nlp-1.1
Bibkey:
Cite (ACL):
Ashim Gupta and Amrith Krishna. 2023. Adversarial Clean Label Backdoor Attacks and Defenses on Text Classification Systems. In Proceedings of the 8th Workshop on Representation Learning for NLP (RepL4NLP 2023), pages 1–12, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Adversarial Clean Label Backdoor Attacks and Defenses on Text Classification Systems (Gupta & Krishna, RepL4NLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.repl4nlp-1.1.pdf