Towards Adversarially Robust Text Classifiers by Learning to Reweight Clean Examples

Jianhan Xu; Cenyuan Zhang; Xiaoqing Zheng; Linyang Li; Cho-Jui Hsieh; Kai-Wei Chang; Xuan-Jing Huang

doi:10.18653/v1/2022.findings-acl.134

Towards Adversarially Robust Text Classifiers by Learning to Reweight Clean Examples

Jianhan Xu, Cenyuan Zhang, Xiaoqing Zheng, Linyang Li, Cho-Jui Hsieh, Kai-Wei Chang, Xuanjing Huang

Abstract

Most of the existing defense methods improve the adversarial robustness by making the models adapt to the training set augmented with some adversarial examples. However, the augmented adversarial examples may not be natural, which might distort the training distribution, resulting in inferior performance both in clean accuracy and adversarial robustness. In this study, we explore the feasibility of introducing a reweighting mechanism to calibrate the training distribution to obtain robust models. We propose to train text classifiers by a sample reweighting method in which the example weights are learned to minimize the loss of a validation set mixed with the clean examples and their adversarial ones in an online learning manner. Through extensive experiments, we show that there exists a reweighting mechanism to make the models more robust against adversarial attacks without the need to craft the adversarial examples for the entire training set.

Anthology ID:: 2022.findings-acl.134
Volume:: Findings of the Association for Computational Linguistics: ACL 2022
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1694–1707
Language:
URL:: https://aclanthology.org/2022.findings-acl.134
DOI:: 10.18653/v1/2022.findings-acl.134
Bibkey:
Cite (ACL):: Jianhan Xu, Cenyuan Zhang, Xiaoqing Zheng, Linyang Li, Cho-Jui Hsieh, Kai-Wei Chang, and Xuanjing Huang. 2022. Towards Adversarially Robust Text Classifiers by Learning to Reweight Clean Examples. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1694–1707, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Towards Adversarially Robust Text Classifiers by Learning to Reweight Clean Examples (Xu et al., Findings 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.findings-acl.134.pdf
Software:: 2022.findings-acl.134.software.zip
Data: AG News, SST, SST-2

PDF Cite Search Software