A Customized Text Sanitization Mechanism with Differential Privacy

Sai Chen, Fengran Mo, Yanhao Wang, Cen Chen, Jian-Yun Nie, Chengyu Wang, Jamie Cui


Abstract
As privacy issues are receiving increasing attention within the Natural Language Processing (NLP) community, numerous methods have been proposed to sanitize texts subject to differential privacy. However, the state-of-the-art text sanitization mechanisms based on a relaxed notion of metric local differential privacy (MLDP) do not apply to non-metric semantic similarity measures and cannot achieve good privacy-utility trade-offs. To address these limitations, we propose a novel Customized Text sanitization (CusText) mechanism based on the original 𝜖-differential privacy (DP) definition, which is compatible with any similarity measure.Moreover, CusText assigns each input token a customized output set to provide more advanced privacy protection at the token level.Extensive experiments on several benchmark datasets show that CusText achieves a better trade-off between privacy and utility than existing mechanisms.The code is available at https://github.com/sai4july/CusText.
Anthology ID:
2023.findings-acl.355
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5747–5758
Language:
URL:
https://aclanthology.org/2023.findings-acl.355
DOI:
10.18653/v1/2023.findings-acl.355
Bibkey:
Cite (ACL):
Sai Chen, Fengran Mo, Yanhao Wang, Cen Chen, Jian-Yun Nie, Chengyu Wang, and Jamie Cui. 2023. A Customized Text Sanitization Mechanism with Differential Privacy. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5747–5758, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
A Customized Text Sanitization Mechanism with Differential Privacy (Chen et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.355.pdf