A Class-Rebalancing Self-Training Framework for Distantly-Supervised Named Entity Recognition

Qi Li, Tingyu Xie, Peng Peng, Hongwei Wang, Gaoang Wang


Abstract
Distant supervision reduces the reliance on human annotation in the named entity recognition tasks. The class-level imbalanced distant annotation is a realistic and unexplored problem, and the popular method of self-training can not handle class-level imbalanced learning. More importantly, self-training is dominated by the high-performance class in selecting candidates, and deteriorates the low-performance class with the bias of generated pseudo label. To address the class-level imbalance performance, we propose a class-rebalancing self-training framework for improving the distantly-supervised named entity recognition. In candidate selection, a class-wise flexible threshold is designed to fully explore other classes besides the high-performance class. In label generation, injecting the distant label, a hybrid pseudo label is adopted to provide straight semantic information for the low-performance class. Experiments on five flat and two nested datasets show that our model achieves state-of-the-art results. We also conduct extensive research to analyze the effectiveness of the flexible threshold and the hybrid pseudo label.
Anthology ID:
2023.findings-acl.703
Volume:
Findings of the Association for Computational Linguistics: ACL 2023
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11054–11068
Language:
URL:
https://aclanthology.org/2023.findings-acl.703
DOI:
10.18653/v1/2023.findings-acl.703
Bibkey:
Cite (ACL):
Qi Li, Tingyu Xie, Peng Peng, Hongwei Wang, and Gaoang Wang. 2023. A Class-Rebalancing Self-Training Framework for Distantly-Supervised Named Entity Recognition. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11054–11068, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
A Class-Rebalancing Self-Training Framework for Distantly-Supervised Named Entity Recognition (Li et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-acl.703.pdf