Optimizing Relation Extraction in Medical Texts through Active Learning: A Comparative Analysis of Trade-offs

Siting Liang, Pablo Sánchez, Daniel Sonntag


Abstract
This work explores the effectiveness of employing Clinical BERT for Relation Extraction (RE) tasks in medical texts within an Active Learning (AL) framework. Our main objective is to optimize RE in medical texts through AL while examining the trade-offs between performance and computation time, comparing it with alternative methods like Random Forest and BiLSTM networks. Comparisons extend to feature engineering requirements, performance metrics, and considerations of annotation costs, including AL step times and annotation rates. The utilization of AL strategies aligns with our broader goal of enhancing the efficiency of relation classification models, particularly when dealing with the challenges of annotating complex medical texts in a Human-in-the-Loop (HITL) setting. The results indicate that uncertainty-based sampling achieves comparable performance with significantly fewer annotated samples across three categories of supervised learning methods, thereby reducing annotation costs for clinical and biomedical corpora. While Clinical BERT exhibits clear performance advantages across two different corpora, the trade-off involves longer computation times in interactive annotation processes. In real-world applications, where practical feasibility and timely results are crucial, optimizing this trade-off becomes imperative.
Anthology ID:
2024.uncertainlp-1.3
Volume:
Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024)
Month:
March
Year:
2024
Address:
St Julians, Malta
Editors:
Raúl Vázquez, Hande Celikkanat, Dennis Ulmer, Jörg Tiedemann, Swabha Swayamdipta, Wilker Aziz, Barbara Plank, Joris Baan, Marie-Catherine de Marneffe
Venues:
UncertaiNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
23–34
Language:
URL:
https://aclanthology.org/2024.uncertainlp-1.3
DOI:
Bibkey:
Cite (ACL):
Siting Liang, Pablo Sánchez, and Daniel Sonntag. 2024. Optimizing Relation Extraction in Medical Texts through Active Learning: A Comparative Analysis of Trade-offs. In Proceedings of the 1st Workshop on Uncertainty-Aware NLP (UncertaiNLP 2024), pages 23–34, St Julians, Malta. Association for Computational Linguistics.
Cite (Informal):
Optimizing Relation Extraction in Medical Texts through Active Learning: A Comparative Analysis of Trade-offs (Liang et al., UncertaiNLP-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.uncertainlp-1.3.pdf