GLEN: General-Purpose Event Detection for Thousands of Types

Sha Li, Qiusi Zhan, Kathryn Conger, Martha Palmer, Heng Ji, Jiawei Han


Abstract
The progress of event extraction research has been hindered by the absence of wide-coverage, large-scale datasets. To make event extraction systems more accessible, we build a general-purpose event detection dataset GLEN, which covers 205K event mentions with 3,465 different types, making it more than 20x larger in ontology than today’s largest event dataset. GLEN is created by utilizing the DWD Overlay, which provides a mapping between Wikidata Qnodes and PropBank rolesets. This enables us to use the abundant existing annotation for PropBank as distant supervision. In addition, we also propose a new multi-stage event detection model specifically designed to handle the large ontology size in GLEN. We show that our model exhibits superior performance compared to a range of baselines including InstructGPT. Finally, we perform error analysis and show that label noise is still the largest challenge for improving performance for this new dataset.
Anthology ID:
2023.emnlp-main.170
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2823–2838
Language:
URL:
https://aclanthology.org/2023.emnlp-main.170
DOI:
10.18653/v1/2023.emnlp-main.170
Bibkey:
Cite (ACL):
Sha Li, Qiusi Zhan, Kathryn Conger, Martha Palmer, Heng Ji, and Jiawei Han. 2023. GLEN: General-Purpose Event Detection for Thousands of Types. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2823–2838, Singapore. Association for Computational Linguistics.
Cite (Informal):
GLEN: General-Purpose Event Detection for Thousands of Types (Li et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.170.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.170.mp4