Robustness-Aware Word Embedding Improves Certified Robustness to Adversarial Word Substitutions

Yibin Wang, Yichen Yang, Di He, Kun He


Abstract
Natural Language Processing (NLP) models have gained great success on clean texts, but they are known to be vulnerable to adversarial examples typically crafted by synonym substitutions. In this paper, we target to solve this problem and find that word embedding is important to the certified robustness of NLP models. Given the findings, we propose the Embedding Interval Bound Constraint (EIBC) triplet loss to train robustness-aware word embeddings for better certified robustness. We optimize the EIBC triplet loss to reduce distances between synonyms in the embedding space, which is theoretically proven to make the verification boundary tighter. Meanwhile, we enlarge distances among non-synonyms, maintaining the semantic representation of word embeddings. Our method is conceptually simple and componentized. It can be easily combined with IBP training and improves the certified robust accuracy from 76.73% to 84.78% on the IMDB dataset. Experiments demonstrate that our method outperforms various state-of-the-art certified defense baselines and generalizes well to unseen substitutions. The code is available at https://github.com/JHL-HUST/EIBC-IBP/.
Anthology ID:
2023.findings-acl.42
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
673–687
Language:
URL:
https://aclanthology.org/2023.findings-acl.42
DOI:
10.18653/v1/2023.findings-acl.42
Bibkey:
Cite (ACL):
Yibin Wang, Yichen Yang, Di He, and Kun He. 2023. Robustness-Aware Word Embedding Improves Certified Robustness to Adversarial Word Substitutions. In Findings of the Association for Computational Linguistics: ACL 2023, pages 673–687, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Robustness-Aware Word Embedding Improves Certified Robustness to Adversarial Word Substitutions (Wang et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.42.pdf