W. John Wilbur

Also published as: W John Wilbur


2021

pdf bib
Measuring the relative importance of full text sections for information retrieval from scientific literature.
Lana Yeganova | Won Gyu Kim | Donald Comeau | W John Wilbur | Zhiyong Lu
Proceedings of the 20th Workshop on Biomedical Language Processing

With the growing availability of full-text articles, integrating abstracts and full texts of documents into a unified representation is essential for comprehensive search of scientific literature. However, previous studies have shown that naïvely merging abstracts with full texts of articles does not consistently yield better performance. Balancing the contribution of query terms appearing in the abstract and in sections of different importance in full text articles remains a challenge both with traditional bag-of-words IR approaches and for neural retrieval methods. In this work we establish the connection between the BM25 score of a query term appearing in a section of a full text document and the probability of that document being clicked or identified as relevant. Probability is computed using Pool Adjacent Violators (PAV), an isotonic regression algorithm, providing a maximum likelihood estimate based on the observed data. Using this probabilistic transformation of BM25 scores we show an improved performance on the PubMed Click dataset developed and presented in this study, as well as the 2007 TREC Genomics collection.

2018

pdf bib
SingleCite: Towards an improved Single Citation Search in PubMed
Lana Yeganova | Donald C Comeau | Won Kim | W John Wilbur | Zhiyong Lu
Proceedings of the BioNLP 2018 workshop

A search that is targeted at finding a specific document in databases is called a Single Citation search. Single citation searches are particularly important for scholarly databases, such as PubMed, because users are frequently searching for a specific publication. In this work we describe SingleCite, a single citation matching system designed to facilitate user’s search for a specific document. We report on the progress that has been achieved towards building that functionality.

pdf bib
MeSH-based dataset for measuring the relevance of text retrieval
Won Gyu Kim | Lana Yeganova | Donald Comeau | W John Wilbur | Zhiyong Lu
Proceedings of the BioNLP 2018 workshop

Creating simulated search environments has been of a significant interest in infor-mation retrieval, in both general and bio-medical search domains. Existing collec-tions include modest number of queries and are constructed by manually evaluat-ing retrieval results. In this work we pro-pose leveraging MeSH term assignments for creating synthetic test beds. We select a suitable subset of MeSH terms as queries, and utilize MeSH term assignments as pseudo-relevance rankings for retrieval evaluation. Using well studied retrieval functions, we show that their performance on the proposed data is consistent with similar findings in previous work. We further use the proposed retrieval evaluation framework to better understand how to combine heterogeneous sources of textual information.

2016

pdf bib
PubTermVariants: biomedical term variants and their use for PubMed search
Lana Yeganova | Won Kim | Sun Kim | Rezarta Islamaj Doğan | Wanli Liu | Donald C Comeau | Zhiyong Lu | W John Wilbur
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

2015

pdf bib
Summarizing Topical Contents from PubMed Documents Using a Thematic Analysis
Sun Kim | Lana Yeganova | W. John Wilbur
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2013

pdf bib
Extracting Biomedical Events and Modifications Using Subgraph Matching with Noisy Training Data
Andrew MacKinlay | David Martinez | Antonio Jimeno Yepes | Haibin Liu | W. John Wilbur | Karin Verspoor
Proceedings of the BioNLP Shared Task 2013 Workshop

pdf bib
Generalizing an Approximate Subgraph Matching-based System to Extract Events in Molecular Biology and Cancer Genetics
Haibin Liu | Karin Verspoor | Donald C. Comeau | Andrew MacKinlay | W. John Wilbur
Proceedings of the BioNLP Shared Task 2013 Workshop

pdf bib
BioNLP Shared Task 2013: Supporting Resources
Pontus Stenetorp | Wiktoria Golik | Thierry Hamon | Donald C. Comeau | Rezarta Islamaj Doğan | Haibin Liu | W. John Wilbur
Proceedings of the BioNLP Shared Task 2013 Workshop

2012

pdf bib
Classifying Gene Sentences in Biomedical Literature by Combining High-Precision Gene Identifiers
Sun Kim | Won Kim | Don Comeau | W. John Wilbur
BioNLP: Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

2011

pdf bib
Automatic extraction of data deposition statements: where do the research results go?
Aurélie Névéol | W. John Wilbur | Zhiyong Lu
Proceedings of BioNLP 2011 Workshop

pdf bib
Text Mining Techniques for Leveraging Positively Labeled Data
Lana Yeganova | Donald C. Comeau | Won Kim | W. John Wilbur
Proceedings of BioNLP 2011 Workshop

2009

pdf bib
Exploring Two Biomedical Text Genres for Disease Recognition
Aurélie Névéol | Won Kim | W. John Wilbur | Zhiyong Lu
Proceedings of the BioNLP 2009 Workshop

2007

pdf bib
Unsupervised Learning of the Morpho-Semantic Relationship in MEDLINE
W. John Wilbur
Biological, translational, and clinical language processing

2006

pdf bib
A Priority Model for Named Entities
Lorraine Tanabe | W. John Wilbur
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

2005

pdf bib
MedTag: A Collection of Biomedical Annotations
Lawrence H. Smith | Lorraine Tanabe | Thomas Rindflesch | W. John Wilbur
Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining Biological Semantics

2002

pdf bib
Tagging gene and protein names in full text articles
Lorraine Tanabe | W. John Wilbur
Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain