Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction

Haotian Chen, Bingsheng Chen, Xiangdong Zhou


Abstract
Document-level relation extraction (DocRE) attracts more research interest recently. While models achieve consistent performance gains in DocRE, their underlying decision rules are still understudied: Do they make the right predictions according to rationales? In this paper, we take the first step toward answering this question and then introduce a new perspective on comprehensively evaluating a model. Specifically, we first conduct annotations to provide the rationales considered by humans in DocRE. Then, we conduct investigations and discover the fact that: In contrast to humans, the representative state-of-the-art (SOTA) models in DocRE exhibit different reasoning processes. Through our proposed RE-specific attacks, we next demonstrate that the significant discrepancy in decision rules between models and humans severely damages the robustness of models. After that, we introduce mean average precision (MAP) to evaluate the understanding and reasoning capabilities of models. According to the extensive experimental results, we finally appeal to future work to consider evaluating the understanding ability of models because the improved ability renders models more trustworthy and robust to be deployed in real-world scenarios. We make our annotations and code publicly available.
Anthology ID:
2023.acl-long.354
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6418–6435
Language:
URL:
https://aclanthology.org/2023.acl-long.354
DOI:
10.18653/v1/2023.acl-long.354
Bibkey:
Cite (ACL):
Haotian Chen, Bingsheng Chen, and Xiangdong Zhou. 2023. Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6418–6435, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction (Chen et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-long.354.pdf
Video:
 https://aclanthology.org/2023.acl-long.354.mp4