Mihaela Bornea


2023

pdf bib
PrimeQA: The Prime Repository for State-of-the-Art Multilingual Question Answering Research and Development
Avi Sil | Jaydeep Sen | Bhavani Iyer | Martin Franz | Kshitij Fadnis | Mihaela Bornea | Sara Rosenthal | Scott McCarley | Rong Zhang | Vishwajeet Kumar | Yulong Li | Md Arafat Sultan | Riyaz Bhat | Juergen Bross | Radu Florian | Salim Roukos
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components such as retrievers and readers. In this paper, we introduce PrimeQA: a one-stop and open-source QA repository with an aim to democratize QA research and facilitate easy replication of state-of-the-art (SOTA) QA methods. PrimeQA supports core QA functionalities like retrieval and reading comprehension as well as auxiliary capabilities such as question generation. It has been designed as an end-to-end toolkit for various use cases: building front-end applications, replicating SOTA methods on public benchmarks, and expanding pre-existing methods. PrimeQA is available at: https://github.com/primeqa.

2020

pdf bib
A Multilingual Reading Comprehension System for more than 100 Languages
Anthony Ferritto | Sara Rosenthal | Mihaela Bornea | Kazi Hasan | Rishav Chakravarti | Salim Roukos | Radu Florian | Avi Sil
Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations

This paper presents M-GAAMA, a Multilingual Question Answering architecture and demo system. This is the first multilingual machine reading comprehension (MRC) demo which is able to answer questions in over 100 languages. M-GAAMA answers questions from a given passage in the same or different language. It incorporates several existing multilingual models that can be used interchangeably in the demo such as M-BERT and XLM-R. The M-GAAMA demo also improves language accessibility by incorporating the IBM Watson machine translation widget to provide additional capabilities to the user to see an answer in their desired language. We also show how M-GAAMA can be used in downstream tasks by incorporating it into an END-TO-END-QA system using CFO (Chakravarti et al., 2019). We experiment with our system architecture on the Multi-Lingual Question Answering (MLQA) and the COVID-19 CORD (Wang et al., 2020; Tang et al., 2020) datasets to provide insights into the performance of the system.

2019

pdf bib
Combining Unsupervised Pre-training and Annotator Rationales to Improve Low-shot Text Classification
Oren Melamud | Mihaela Bornea | Ken Barker
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Supervised learning models often perform poorly at low-shot tasks, i.e. tasks for which little labeled data is available for training. One prominent approach for improving low-shot learning is to use unsupervised pre-trained neural models. Another approach is to obtain richer supervision by collecting annotator rationales (explanations supporting label annotations). In this work, we combine these two approaches to improve low-shot text classification with two novel methods: a simple bag-of-words embedding approach; and a more complex context-aware method, based on the BERT model. In experiments with two English text classification datasets, we demonstrate substantial performance gains from combining pre-training with rationales. Furthermore, our investigation of a range of train-set sizes reveals that the simple bag-of-words approach is the clear top performer when there are only a few dozen training instances or less, while more complex models, such as BERT or CNN, require more training data to shine.

2017

pdf bib
Stacking With Auxiliary Features for Entity Linking in the Medical Domain
Nazneen Fatema Rajani | Mihaela Bornea | Ken Barker
BioNLP 2017

Linking spans of natural language text to concepts in a structured source is an important task for many problems. It allows intelligent systems to leverage rich knowledge available in those sources (such as concept properties and relations) to enhance the semantics of the mentions of these concepts in text. In the medical domain, it is common to link text spans to medical concepts in large, curated knowledge repositories such as the Unified Medical Language System. Different approaches have different strengths: some are precision-oriented, some recall-oriented; some better at considering context but more prone to hallucination. The variety of techniques suggests that ensembling could outperform component technologies at this task. In this paper, we describe our process for building a Stacking ensemble using additional, auxiliary features for Entity Linking in the medical domain. We report experiments that show that naive ensembling does not always outperform component Entity Linking systems, that stacking usually outperforms naive ensembling, and that auxiliary features added to the stacker further improve its performance on three distinct datasets. Our best model produces state-of-the-art results on several medical datasets.

2016

pdf bib
Scoring Disease-Medication Associations using Advanced NLP, Machine Learning, and Multiple Content Sources
Bharath Dandala | Murthy Devarakonda | Mihaela Bornea | Christopher Nielson
Proceedings of the Fifth Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM2016)

Effective knowledge resources are critical for developing successful clinical decision support systems that alleviate the cognitive load on physicians in patient care. In this paper, we describe two new methods for building a knowledge resource of disease to medication associations. These methods use fundamentally different content and are based on advanced natural language processing and machine learning techniques. One method uses distributional semantics on large medical text, and the other uses data mining on a large number of patient records. The methods are evaluated using 25,379 unique disease-medication pairs extracted from 100 de-identified longitudinal patient records of a large multi-provider hospital system. We measured recall (R), precision (P), and F scores for positive and negative association prediction, along with coverage and accuracy. While individual methods performed well, a combined stacked classifier achieved the best performance, indicating the limitations and unique value of each resource and method. In predicting positive associations, the stacked combination significantly outperformed the baseline (a distant semi-supervised method on large medical text), achieving F scores of 0.75 versus 0.55 on the pairs seen in the patient records, and F scores of 0.69 and 0.35 on unique pairs.