Prototype-Guided Pseudo Labeling for Semi-Supervised Text Classification

Weiyi Yang, Richong Zhang, Junfan Chen, Lihong Wang, Jaein Kim


Abstract
Semi-supervised text classification (SSTC) aims at text classification with few labeled data and massive unlabeled data. Recent works achieve this task by pseudo-labeling methods, with the belief that the unlabeled and labeled data have identical data distribution, and assign the unlabeled data with pseudo-labels as additional supervision. However, existing pseudo-labeling methods usually suffer from ambiguous categorical boundary issues when training the pseudo-labeling phase, and simply select pseudo-labels without considering the unbalanced categorical distribution of the unlabeled data, making it difficult to generate reliable pseudo-labels for each category. We propose a novel semi-supervised framework, namely ProtoS2, with prototypical cluster separation (PCS) and prototypical-center data selection (CDS) technology to address the issue. Particularly, PCS exploits categorical prototypes to assimilate instance representations within the same category, thus emphasizing low-density separation for the pseudo-labeled data to alleviate ambiguous boundaries. Besides, CDS selects central pseudo-labeled data considering the categorical distribution, avoiding the model from biasing on dominant categories. Empirical studies and extensive analysis with four benchmarks demonstrate the effectiveness of the proposed model.
Anthology ID:
2023.acl-long.904
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16369–16382
Language:
URL:
https://aclanthology.org/2023.acl-long.904
DOI:
10.18653/v1/2023.acl-long.904
Bibkey:
Cite (ACL):
Weiyi Yang, Richong Zhang, Junfan Chen, Lihong Wang, and Jaein Kim. 2023. Prototype-Guided Pseudo Labeling for Semi-Supervised Text Classification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16369–16382, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Prototype-Guided Pseudo Labeling for Semi-Supervised Text Classification (Yang et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.904.pdf
Video:
 https://aclanthology.org/2023.acl-long.904.mp4