Daniel Butzke


2023

pdf bib
Is the ranking of PubMed similar articles good enough? An evaluation of text similarity methods for three datasets
Mariana Neves | Ines Schadock | Beryl Eusemann | Gilbert Schnfelder | Bettina Bert | Daniel Butzke
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

The use of seed articles in information retrieval provides many advantages, such as a longercontext and more details about the topic being searched for. Given a seed article (i.e., a PMID), PubMed provides a pre-compiled list of similar articles to support the user in finding equivalent papers in the biomedical literature. We aimed at performing a quantitative evaluation of the PubMed Similar Articles based on three existing biomedical text similarity datasets, namely, RELISH, TREC-COVID, and SMAFIRA-c. Further, we carried out a survey and an evaluation of various text similarity methods on these three datasets. Our experiments considered the original title and abstract from PubMed as well as automatically detected sections and manually annotated relevant sentences. We provide an overview about which methods better performfor each dataset and compare them to the ranking in PubMed similar articles. While resultsvaried considerably among the datasets, we were able to obtain a better performance thanPubMed for all of them. Datasets and source codes are available at: https://github.com/mariananeves/reranking

2019

pdf bib
Evaluation of Scientific Elements for Text Similarity in Biomedical Publications
Mariana Neves | Daniel Butzke | Barbara Grune
Proceedings of the 6th Workshop on Argument Mining

Rhetorical elements from scientific publications provide a more structured view of the document and allow algorithms to focus on particular parts of the text. We surveyed the literature for previously proposed schemes for rhetorical elements and present an overview of its current state of the art. We also searched for available tools using these schemes and applied four tools for our particular task of ranking biomedical abstracts based on text similarity. Comparison of the tools with two strong baselines shows that the predictions provided by the ArguminSci tool can support our use case of mining alternative methods for animal experiments.

2018

pdf bib
Bf3R at SemEval-2018 Task 7: Evaluating Two Relation Extraction Tools for Finding Semantic Relations in Biomedical Abstracts
Mariana Neves | Daniel Butzke | Gilbert Schönfelder | Barbara Grune
Proceedings of the 12th International Workshop on Semantic Evaluation

Automatic extraction of semantic relations from text can support finding relevant information from scientific publications. We describe our participation in Task 7 of SemEval-2018 for which we experimented with two relations extraction tools - jSRE and TEES - for the extraction and classification of six relation types. The results we obtained with TEES were significantly superior than those with jSRE (33.4% vs. 30.09% and 20.3% vs. 16%). Additionally, we utilized the model trained with TEES for extracting semantic relations from biomedical abstracts, for which we present a preliminary evaluation.