LLM-enhanced Self-training for Cross-domain Constituency Parsing

Jianling Li, Meishan Zhang, Peiming Guo, Min Zhang, Yue Zhang


Abstract
Self-training has proven to be an effective approach for cross-domain tasks, and in this study, we explore its application to cross-domain constituency parsing. Traditional self-training methods rely on limited and potentially low-quality raw corpora. To overcome this limitation, we propose enhancing self-training with the large language model (LLM) to generate domain-specific raw corpora iteratively. For the constituency parsing, we introduce grammar rules that guide the LLM in generating raw corpora and establish criteria for selecting pseudo instances. Our experimental results demonstrate that self-training for constituency parsing, equipped with an LLM, outperforms traditional methods regardless of the LLM’s performance. Moreover, the combination of grammar rules and confidence criteria for pseudo-data selection yields the highest performance in the cross-domain constituency parsing.
Anthology ID:
2023.emnlp-main.508
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8174–8185
Language:
URL:
https://aclanthology.org/2023.emnlp-main.508
DOI:
10.18653/v1/2023.emnlp-main.508
Bibkey:
Cite (ACL):
Jianling Li, Meishan Zhang, Peiming Guo, Min Zhang, and Yue Zhang. 2023. LLM-enhanced Self-training for Cross-domain Constituency Parsing. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8174–8185, Singapore. Association for Computational Linguistics.
Cite (Informal):
LLM-enhanced Self-training for Cross-domain Constituency Parsing (Li et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.508.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.508.mp4