Khoa Duong


2023

pdf bib
Disease Network Constructor: a Pathway Extraction and Visualization
Mohammad Golam Sohrab | Khoa Duong | Goran Topić | Masami Ikeda | Nozomi Nagano | Yayoi Natsume-Kitatani | Masakata Kuroda | Mari Itoh | Hiroya Takamura
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

We present Disease Network Constructor (DNC), a system that extracts and visualizes a disease network, in which nodes are entities such as diseases, proteins, and genes, and edges represent regulation relation. We focused on the disease network derived through regulation events found in scientific articles on idiopathic pulmonary fibrosis (IPF). The front-end web-base user interface of DNC includes two-dimensional (2D) and 3D visualizations of the constructed disease network. The back-end system of DNC includes several natural language processing (NLP) techniques to process biomedical text including BERT-based tokenization on the basis of Bidirectional Encoder Representations from Transformers (BERT), flat and nested named entity recognition (NER), candidate generation and candidate ranking for entity linking (EL) or, relation extraction (RE), and event extraction (EE) tasks. We evaluated the end-to-end EL and end-to-end nested EE systems to determine the DNC’s back-endimplementation performance. To the best of our knowledge, this is the first attempt that addresses neural NER, EL, RE, and EE tasks in an end-to-end manner that constructs a path-way visualization from events, which we name Disease Network Constructor. The demonstration video can be accessed from https://youtu.be/rFhWwAgcXE8. We release an online system for end users and the source code is available at https://github.com/aistairc/PRISM-APIs/.

2020

pdf bib
BENNERD: A Neural Named Entity Linking System for COVID-19
Mohammad Golam Sohrab | Khoa Duong | Makoto Miwa | Goran Topić | Ikeda Masami | Takamura Hiroya
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present a biomedical entity linking (EL) system BENNERD that detects named enti- ties in text and links them to the unified medical language system (UMLS) knowledge base (KB) entries to facilitate the corona virus disease 2019 (COVID-19) research. BEN- NERD mainly covers biomedical domain, es- pecially new entity types (e.g., coronavirus, vi- ral proteins, immune responses) by address- ing CORD-NER dataset. It includes several NLP tools to process biomedical texts includ- ing tokenization, flat and nested entity recog- nition, and candidate generation and rank- ing for EL that have been pre-trained using the CORD-NER corpus. To the best of our knowledge, this is the first attempt that ad- dresses NER and EL on COVID-19-related entities, such as COVID-19 virus, potential vaccines, and spreading mechanism, that may benefit research on COVID-19. We release an online system to enable real-time entity annotation with linking for end users. We also release the manually annotated test set and CORD-NERD dataset for leveraging EL task. The BENNERD system is available at https://aistairc.github.io/BENNERD/.