Kohichi Takai


2021

pdf bib
Named Entity-Factored Transformer for Proper Noun Translation
Kohichi Takai | Gen Hattori | Akio Yoneyama | Keiji Yasuda | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 18th International Conference on Natural Language Processing (ICON)

Subword-based neural machine translation decreases the number of out-of-vocabulary (OOV) words and also keeps the translation quality if input sentences include OOV words. The subword-based NMT decomposes a word into shorter units to solve the OOV problem, but it does not work well for non-compositional proper nouns due to the construction of the shorter unit from words. Furthermore, the lack of translation also occurs in proper noun translation. The proposed method applies the Named Entity (NE) fea-ture vector to Factored Transformer for accurate proper noun translation. The proposed method uses two features which are input sentences in subwords unit and the feature obtained from Named Entity Recognition (NER). The pro-posed method improves the problem of non-compositional proper nouns translation included a low-frequency word. According to the experiments, the proposed method using the best NE feature vector outperformed the baseline sub-word-based transformer model by more than 9.6 points in proper noun accuracy and 2.5 points in the BLEU score.

2018

pdf bib
Prediction Models for Risk of Type-2 Diabetes Using Health Claims
Masatoshi Nagata | Kohichi Takai | Keiji Yasuda | Panikos Heracleous | Akio Yoneyama
Proceedings of the BioNLP 2018 workshop

This study focuses on highly accurate prediction of the onset of type-2 diabetes. We investigated whether prediction accuracy can be improved by utilizing lab test data obtained from health checkups and incorporating health claim text data such as medically diagnosed diseases with ICD10 codes and pharmacy information. In a previous study, prediction accuracy was increased slightly by adding diagnosis disease name and independent variables such as prescription medicine. Therefore, in the current study we explored more suitable models for prediction by using state-of-the-art techniques such as XGBoost and long short-term memory (LSTM) based on recurrent neural networks. In the current study, text data was vectorized using word2vec, and the prediction model was compared with logistic regression. The results obtained confirmed that onset of type-2 diabetes can be predicted with a high degree of accuracy when the XGBoost model is used.