Eiichiro Sumita

Also published as: Eiichro Sumita

2023

pdf bib abs
Pivot Translation for Zero-resource Language Pairs Based on a Multilingual Pretrained Model
Kenji Imamura | Masao Utiyama | Eiichiro Sumita
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

A multilingual translation model enables a single model to handle multiple languages. However, the translation qualities of unlearned language pairs (i.e., zero-shot translation qualities) are still poor. By contrast, pivot translation translates source texts into target ones via a pivot language such as English, thus enabling machine translation without parallel texts between the source and target languages. In this paper, we perform pivot translation using a multilingual model and compare it with direct translation. We improve the translation quality without using parallel texts of direct translation by fine-tuning the model with machine-translated pseudo-translations. We also discuss what type of parallel texts are suitable for effectively improving the translation quality in multilingual pivot translation.

pdf bib abs
Subset Retrieval Nearest Neighbor Machine Translation
Hiroyuki Deguchi | Taro Watanabe | Yusuke Matsui | Masao Utiyama | Hideki Tanaka | Eiichiro Sumita
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

k-nearest-neighbor machine translation (kNN-MT) (Khandelwal et al., 2021) boosts the translation performance of trained neural machine translation (NMT) models by incorporating example-search into the decoding algorithm. However, decoding is seriously time-consuming, i.e., roughly 100 to 1,000 times slower than standard NMT, because neighbor tokens are retrieved from all target tokens of parallel data in each timestep. In this paper, we propose “Subset kNN-MT”, which improves the decoding speed of kNN-MT by two methods: (1) retrieving neighbor target tokens from a subset that is the set of neighbor sentences of the input sentence, not from all sentences, and (2) efficient distance computation technique that is suitable for subset neighbor search using a look-up table. Our proposed method achieved a speed-up of up to 132.2 times and an improvement in BLEU score of up to 1.6 compared with kNN-MT in the WMT’19 De-En translation task and the domain adaptation tasks in De-En and En-Ja.

pdf bib abs
Japanese-to-English Simultaneous Dubbing Prototype
Xiaolin Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

Live video streaming has become an important form of communication such as virtual conferences. However, for cross-language communication in live video streaming, reading subtitles degrades the viewing experience. To address this problem, our simultaneous dubbing prototype translates and replaces the original speech of a live video stream in a simultaneous manner. Tests on a collection of 90 public videos show that our system achieves a low average latency of 11.90 seconds for smooth playback. Our method is general and can be extended to other language pairs.

pdf bib abs
YANMTT: Yet Another Neural Machine Translation Toolkit
Raj Dabre | Diptesh Kanojia | Chinmay Sawant | Eiichiro Sumita
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations)

In this paper, we present our open-source neural machine translation (NMT) toolkit called “Yet Another Neural Machine Translation Toolkit” abbreviated as YANMTT - https://github.com/prajdabre/yanmtt, which is built on top of the HuggingFace Transformers library. YANMTT focuses on transfer learning and enables easy pre-training and fine-tuning of sequence-to-sequence models at scale. It can be used for training parameter-heavy models with minimal parameter sharing and efficient, lightweight models via heavy parameter sharing. Additionally, it supports parameter-efficient fine-tuning (PEFT) through adapters and prompts. Our toolkit also comes with a user interface that can be used to demonstrate these models and visualize various parts of the model. Apart from these core features, our toolkit also provides other advanced functionalities such as but not limited to document/multi-source NMT, simultaneous NMT, mixtures-of-experts, model compression and continual learning.

2022

pdf bib
A Japanese Corpus of Many Specialized Domains for Word Segmentation and Part-of-Speech Tagging
Shohei Higashiyama | Masao Ideuchi | Masao Utiyama | Yoshiaki Oida | Eiichiro Sumita
Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems

pdf bib abs
FeatureBART: Feature Based Sequence-to-Sequence Pre-Training for Low-Resource NMT
Abhisek Chakrabarty | Raj Dabre | Chenchen Ding | Hideki Tanaka | Masao Utiyama | Eiichiro Sumita
Proceedings of the 29th International Conference on Computational Linguistics

In this paper we present FeatureBART, a linguistically motivated sequence-to-sequence monolingual pre-training strategy in which syntactic features such as lemma, part-of-speech and dependency labels are incorporated into the span prediction based pre-training framework (BART). These automatically extracted features are incorporated via approaches such as concatenation and relevance mechanisms, among which the latter is known to be better than the former. When used for low-resource NMT as a downstream task, we show that these feature based models give large improvements in bilingual settings and modest ones in multilingual settings over their counterparts that do not use features.

pdf bib abs
A Multimodal Simultaneous Interpretation Prototype: Who Said What
Xiaolin Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)

“Who said what” is essential for users to understand video streams that have more than one speaker, but conventional simultaneous interpretation systems merely present “what was said” in the form of subtitles. Because the translations unavoidably have delays and errors, users often find it difficult to trace the subtitles back to speakers. To address this problem, we propose a multimodal SI system that presents users “who said what”. Our system takes audio-visual approaches to recognize the speaker of each sentence, and then annotates its translation with the textual tag and face icon of the speaker, so that users can quickly understand the scenario. Furthermore, our system is capable of interpreting video streams in real-time on a single desktop equipped with two Quadro RTX 4000 GPUs owing to an efficient sentence-based architecture.

pdf bib abs
Restricted or Not: A General Training Framework for Neural Machine Translation
Zuchao Li | Masao Utiyama | Eiichiro Sumita | Hai Zhao
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Restricted machine translation incorporates human prior knowledge into translation. It restricts the flexibility of the translation to satisfy the demands of translation in specific scenarios. Existing work typically imposes constraints on beam search decoding. Although this can satisfy the requirements overall, it usually requires a larger beam size and far longer decoding time than unrestricted translation, which limits the concurrent processing ability of the translation model in deployment, and thus its practicality. In this paper, we propose a general training framework that allows a model to simultaneously support both unrestricted and restricted translation by adopting an additional auxiliary training process without constraining the decoding process. This maintains the benefits of restricted translation but greatly reduces the extra time overhead of constrained decoding, thus improving its practicality. The effectiveness of our proposed training framework is demonstrated by experiments on both original (WAT21 En↔Ja) and simulated (WMT14 En→De and En→Fr) restricted translation benchmarks.

Deep learning has demonstrated performance advantages in a wide range of natural language processing tasks, including neural machine translation (NMT). Transformer NMT models are typically strengthened by deeper encoder layers, but deepening their decoder layers usually results in failure. In this paper, we first identify the cause of the failure of the deep decoder in the Transformer model. Inspired by this discovery, we then propose approaches to improving it, with respect to model structure and model training, to make the deep decoder practical in NMT. Specifically, with respect to model structure, we propose a cross-attention drop mechanism to allow the decoder layers to perform their own different roles, to reduce the difficulty of deep-decoder learning. For model training, we propose a collapse reducing training approach to improve the stability and effectiveness of deep-decoder training. We experimentally evaluated our proposed Transformer NMT model structure modification and novel training methods on several popular machine translation benchmarks. The results showed that deepening the NMT model by increasing the number of decoder layers successfully prevented the deepened decoder from degrading to an unconditional language model. In contrast to prior work on deepening an NMT model on the encoder, our method can deepen the model on both the encoder and decoder at the same time, resulting in a deeper model and improved performance.

pdf bib abs
Synchronous Refinement for Neural Machine Translation
Kehai Chen | Masao Utiyama | Eiichiro Sumita | Rui Wang | Min Zhang
Findings of the Association for Computational Linguistics: ACL 2022

Machine translation typically adopts an encoder-to-decoder framework, in which the decoder generates the target sentence word-by-word in an auto-regressive manner. However, the auto-regressive decoder faces a deep-rooted one-pass issue whereby each generated word is considered as one element of the final output regardless of whether it is correct or not. These generated wrong words further constitute the target historical context to affect the generation of subsequent target words. This paper proposes a novel synchronous refinement method to revise potential errors in the generated words by considering part of the target future context. Particularly, the proposed approach allows the auto-regressive decoder to refine the previously generated target words and generate the next target word synchronously. The experimental results on three widely-used machine translation tasks demonstrated the effectiveness of the proposed approach.

2021

pdf bib abs
Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios
Haipeng Sun | Rui Wang | Kehai Chen | Masao Utiyama | Eiichiro Sumita | Tiejun Zhao
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Unsupervised neural machine translation (UNMT) that relies solely on massive monolingual corpora has achieved remarkable results in several translation tasks. However, in real-world scenarios, massive monolingual corpora do not exist for some extremely low-resource languages such as Estonian, and UNMT systems usually perform poorly when there is not adequate training corpus for one language. In this paper, we first define and analyze the unbalanced training data scenario for UNMT. Based on this scenario, we propose UNMT self-training mechanisms to train a robust UNMT system and improve its performance in this case. Experimental results on several language pairs show that the proposed methods substantially outperform conventional UNMT systems.

pdf bib abs
User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization
Shohei Higashiyama | Masao Utiyama | Taro Watanabe | Eiichiro Sumita
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Morphological analysis (MA) and lexical normalization (LN) are both important tasks for Japanese user-generated text (UGT). To evaluate and compare different MA/LN systems, we have constructed a publicly available Japanese UGT corpus. Our corpus comprises 929 sentences annotated with morphological and normalization information, along with category information we classified for frequent UGT-specific phenomena. Experiments on the corpus demonstrated the low performance of existing MA/LN methods for non-general words and non-standard forms, indicating that the corpus would be a challenging benchmark for further research on UGT.

Pretrained multilingual language models have become a key part of cross-lingual transfer for many natural language processing tasks, even those without bilingual information. This work further investigates the cross-lingual transfer ability of these models for constituency parsing and focuses on multi-source transfer. Addressing structure and label set diversity problems, we propose the integration of typological features into the parsing model and treebank normalization. We trained the model on eight languages with diverse structures and use transfer parsing for an additional six low-resource languages. The experimental results show that the treebank normalization is essential for cross-lingual transfer performance and the typological features introduce further improvement. As a result, our approach improves the baseline F1 of multi-source transfer by 5 on average.

pdf bib abs
A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization
Shohei Higashiyama | Masao Utiyama | Taro Watanabe | Eiichiro Sumita
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Lexical normalization, in addition to word segmentation and part-of-speech tagging, is a fundamental task for Japanese user-generated text processing. In this paper, we propose a text editing model to solve the three task jointly and methods of pseudo-labeled data generation to overcome the problem of data deficiency. Our experiments showed that the proposed model achieved better normalization performance when trained on more diverse pseudo-labeled data.

pdf bib abs
NICT’s Neural Machine Translation Systems for the WAT21 Restricted Translation Task
Zuchao Li | Masao Utiyama | Eiichiro Sumita | Hai Zhao
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

This paper describes our system (Team ID: nictrb) for participating in the WAT’21 restricted machine translation task. In our submitted system, we designed a new training approach for restricted machine translation. By sampling from the translation target, we can solve the problem that ordinary training data does not have a restricted vocabulary. With the further help of constrained decoding in the inference phase, we achieved better results than the baseline, confirming the effectiveness of our solution. In addition, we also tried the vanilla and sparse Transformer as the backbone network of the model, as well as model ensembling, which further improved the final translation performance.

pdf bib abs
NICT-2 Translation System at WAT-2021: Applying a Pretrained Multilingual Encoder-Decoder Model to Low-resource Language Pairs
Kenji Imamura | Eiichiro Sumita
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

In this paper, we present the NICT system (NICT-2) submitted to the NICT-SAP shared task at the 8th Workshop on Asian Translation (WAT-2021). A feature of our system is that we used a pretrained multilingual BART (Bidirectional and Auto-Regressive Transformer; mBART) model. Because publicly available models do not support some languages in the NICT-SAP task, we added these languages to the mBART model and then trained it using monolingual corpora extracted from Wikipedia. We fine-tuned the expanded mBART model using the parallel corpora specified by the NICT-SAP task. The BLEU scores greatly improved in comparison with those of systems without the pretrained model, including the additional languages.

pdf bib abs
Unsupervised Neural Machine Translation with Universal Grammar
Zuchao Li | Masao Utiyama | Eiichiro Sumita | Hai Zhao
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Machine translation usually relies on parallel corpora to provide parallel signals for training. The advent of unsupervised machine translation has brought machine translation away from this reliance, though performance still lags behind traditional supervised machine translation. In unsupervised machine translation, the model seeks symmetric language similarities as a source of weak parallel signal to achieve translation. Chomsky’s Universal Grammar theory postulates that grammar is an innate form of knowledge to humans and is governed by universal principles and constraints. Therefore, in this paper, we seek to leverage such shared grammar clues to provide more explicit language parallel signals to enhance the training of unsupervised machine translation models. Through experiments on multiple typical language pairs, we demonstrate the effectiveness of our proposed approaches.

pdf bib abs
Smoothing Dialogue States for Open Conversational Machine Reading
Zhuosheng Zhang | Siru Ouyang | Hai Zhao | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Conversational machine reading (CMR) requires machines to communicate with humans through multi-turn interactions between two salient dialogue states of decision making and question generation processes. In open CMR settings, as the more realistic scenario, the retrieved background knowledge would be noisy, which results in severe challenges in the information transmission. Existing studies commonly train independent or pipeline systems for the two subtasks. However, those methods are trivial by using hard-label decisions to activate question generation, which eventually hinders the model performance. In this work, we propose an effective gating strategy by smoothing the two dialogue states in only one decoder and bridge decision making and question generation to provide a richer dialogue state reference. Experiments on the OR-ShARC dataset show the effectiveness of our method, which achieves new state-of-the-art results.

pdf bib abs
MiSS: An Assistant for Multi-Style Simultaneous Translation
Zuchao Li | Kevin Parnow | Masao Utiyama | Eiichiro Sumita | Hai Zhao
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

In this paper, we present MiSS, an assistant for multi-style simultaneous translation. Our proposed translation system has five key features: highly accurate translation, simultaneous translation, translation for multiple text styles, back-translation for translation quality evaluation, and grammatical error correction. With this system, we aim to provide a complete translation experience for machine translation users. Our design goals are high translation accuracy, real-time translation, flexibility, and measurable translation quality. Compared with the free commercial translation systems commonly used, our translation assistance system regards the machine translation application as a more complete and fully-featured tool for users. By incorporating additional features and giving the user better control over their experience, we improve translation efficiency and performance. Additionally, our assistant system combines machine translation, grammatical error correction, and interactive edits, and uses a crowdsourcing mode to collect more data for further training to improve both the machine translation and grammatical error correction models. A short video demonstrating our system is available at https://www.youtube.com/watch?v=ZGCo7KtRKd8.

pdf bib abs
MiSS@WMT21: Contrastive Learning-reinforced Domain Adaptation in Neural Machine Translation
Zuchao Li | Masao Utiyama | Eiichiro Sumita | Hai Zhao
Proceedings of the Sixth Conference on Machine Translation

In this paper, we describe our MiSS system that participated in the WMT21 news translation task. We mainly participated in the evaluation of the three translation directions of English-Chinese and Japanese-English translation tasks. In the systems submitted, we primarily considered wider networks, deeper networks, relative positional encoding, and dynamic convolutional networks in terms of model structure, while in terms of training, we investigated contrastive learning-reinforced domain adaptation, self-supervised training, and optimization objective switching training methods. According to the final evaluation results, a deeper, wider, and stronger network can improve translation performance in general, yet our data domain adaption method can improve performance even more. In addition, we found that switching to the use of our proposed objective during the finetune phase using relatively small domain-related data can effectively improve the stability of the model’s convergence and achieve better optimal performance.

2020

The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories.

pdf bib abs
Content Word Aware Neural Machine Translation
Kehai Chen | Rui Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Neural machine translation (NMT) encodes the source sentence in a universal way to generate the target sentence word-by-word. However, NMT does not consider the importance of word in the sentence meaning, for example, some words (i.e., content words) express more important meaning than others (i.e., function words). To address this limitation, we first utilize word frequency information to distinguish between content and function words in a sentence, and then design a content word-aware NMT to improve translation performance. Empirical results on the WMT14 English-to-German, WMT14 English-to-French, and WMT17 Chinese-to-English translation tasks show that the proposed methods can significantly improve the performance of Transformer-based NMT.

pdf bib abs
A Three-Parameter Rank-Frequency Relation in Natural Languages
Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present that, the rank-frequency relation in textual data follows f ∝ r^-𝛼(r+𝛾)^-𝛽, where f is the token frequency and r is the rank by frequency, with (𝛼, 𝛽, 𝛾) as parameters. The formulation is derived based on the empirical observation that d² (x+y)/dx² is a typical impulse function, where (x,y)=(log r, log f). The formulation is the power law when 𝛽=0 and the Zipf–Mandelbrot law when 𝛼=0. We illustrate that 𝛼 is related to the analytic features of syntax and 𝛽+𝛾 to those of morphology in natural languages from an investigation of multilingual corpora.

pdf bib abs
Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation
Haipeng Sun | Rui Wang | Kehai Chen | Masao Utiyama | Eiichiro Sumita | Tiejun Zhao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs. However, it can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time. That is, research on multilingual UNMT has been limited. In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder, making use of multilingual data to improve UNMT for all language pairs. On the basis of the empirical findings, we propose two knowledge distillation methods to further enhance multilingual UNMT performance. Our experiments on a dataset with English translated to and from twelve other languages (including three language families and six language branches) show remarkable results, surpassing strong unsupervised individual baselines while achieving promising performance between non-English language pairs in zero-shot translation scenarios and alleviating poor performance in low-resource language pairs.

pdf bib abs
Pre-training via Leveraging Assisting Languages for Neural Machine Translation
Haiyue Song | Raj Dabre | Zhuoyuan Mao | Fei Cheng | Sadao Kurohashi | Eiichiro Sumita
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Sequence-to-sequence (S2S) pre-training using large monolingual data is known to improve performance for various S2S NLP tasks. However, large monolingual corpora might not always be available for the languages of interest (LOI). Thus, we propose to exploit monolingual corpora of other languages to complement the scarcity of monolingual corpora for the LOI. We utilize script mapping (Chinese to Japanese) to increase the similarity (number of cognates) between the monolingual corpora of helping languages and LOI. An empirical case study of low-resource Japanese-English neural machine translation (NMT) reveals that leveraging large Chinese and French monolingual corpora can help overcome the shortage of Japanese and English monolingual corpora, respectively, for S2S pre-training. Using only Chinese and French monolingual corpora, we were able to improve Japanese-English translation quality by up to 8.5 BLEU in low-resource scenarios.

pdf bib abs
Reference Language based Unsupervised Neural Machine Translation
Zuchao Li | Hai Zhao | Rui Wang | Masao Utiyama | Eiichiro Sumita
Findings of the Association for Computational Linguistics: EMNLP 2020

Exploiting a common language as an auxiliary for better translation has a long tradition in machine translation and lets supervised learning-based machine translation enjoy the enhancement delivered by the well-used pivot language in the absence of a source language to target language parallel corpus. The rise of unsupervised neural machine translation (UNMT) almost completely relieves the parallel corpus curse, though UNMT is still subject to unsatisfactory performance due to the vagueness of the clues available for its core back-translation training. Further enriching the idea of pivot translation by extending the use of parallel corpora beyond the source-target paradigm, we propose a new reference language-based framework for UNMT, RUNMT, in which the reference language only shares a parallel corpus with the source, but this corpus still indicates a signal clear enough to help the reconstruction training of UNMT through a proposed reference agreement mechanism. Experimental results show that our methods improve the quality of UNMT over that of a strong baseline that uses only one auxiliary language, demonstrating the usefulness of the proposed reference language-based UNMT and establishing a good start for the community.

Transliteration is generally a phonetically based transcription across different writing systems. It is a crucial task for various downstream natural language processing applications. For the Myanmar (Burmese) language, robust automatic transliteration for borrowed English words is a challenging task because of the complex Myanmar writing system and the lack of data. In this study, we constructed a Myanmar-English named entity dictionary containing more than eighty thousand transliteration instances. The data have been released under a CC BY-NC-SA license. We evaluated the automatic transliteration performance using statistical and neural network-based approaches based on the prepared data. The neural network model outperformed the statistical model significantly in terms of the BLEU score on the character level. Different units used in the Myanmar script for processing were also compared and discussed.

Unsupervised neural machine translation (UNMT) has recently attracted great interest in the machine translation community. The main advantage of the UNMT lies in its easy collection of required large training text sentences while with only a slightly worse performance than supervised neural machine translation which requires expensive annotated translation pairs on some translation tasks. In most studies, the UMNT is trained with clean data without considering its robustness to the noisy data. However, in real-world scenarios, there usually exists noise in the collected input sentences which degrades the performance of the translation system since the UNMT is sensitive to the small perturbations of the input sentences. In this paper, we first time explicitly take the noisy data into consideration to improve the robustness of the UNMT based systems. First of all, we clearly defined two types of noises in training sentences, i.e., word noise and word order noise, and empirically investigate its effect in the UNMT, then we propose adversarial training methods with denoising process in the UNMT. Experimental results on several language pairs show that our proposed methods substantially improved the robustness of the conventional UNMT systems in noisy scenarios.

pdf bib abs
Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation
Abhisek Chakrabarty | Raj Dabre | Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 28th International Conference on Computational Linguistics

In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data. Integrating manually designed or automatically extracted features into the NMT framework is known to be beneficial. However, this study emphasizes that the relevance of the features is crucial to the performance. Specifically, we propose two methods, 1) self relevance and 2) word-based relevance, to improve the representation of features for NMT. Experiments are conducted on translation tasks from English to eight Asian languages, with no more than twenty thousand sentences for training. The proposed methods improve translation quality for all tasks by up to 3.09 BLEU points. Discussions with visualization provide the explainability of the proposed methods where we show that the relevance methods provide weights to features thereby enhancing their impact on low-resource machine translation.

pdf bib abs
Bilingual Subword Segmentation for Neural Machine Translation
Hiroyuki Deguchi | Masao Utiyama | Akihiro Tamura | Takashi Ninomiya | Eiichiro Sumita
Proceedings of the 28th International Conference on Computational Linguistics

This paper proposed a new subword segmentation method for neural machine translation, “Bilingual Subword Segmentation,” which tokenizes sentences to minimize the difference between the number of subword units in a sentence and that of its translation. While existing subword segmentation methods tokenize a sentence without considering its translation, the proposed method tokenizes a sentence by using subword units induced from bilingual sentences; this method could be more favorable to machine translation. Evaluations on WAT Asian Scientific Paper Excerpt Corpus (ASPEC) English-to-Japanese and Japanese-to-English translation tasks and WMT14 English-to-German and German-to-English translation tasks show that our bilingual subword segmentation improves the performance of Transformer neural machine translation (up to +0.81 BLEU).

pdf bib abs
Intermediate Self-supervised Learning for Machine Translation Quality Estimation
Raphael Rubino | Eiichiro Sumita
Proceedings of the 28th International Conference on Computational Linguistics

Pre-training sentence encoders is effective in many natural language processing tasks including machine translation (MT) quality estimation (QE), due partly to the scarcity of annotated QE data required for supervised learning. In this paper, we investigate the use of an intermediate self-supervised learning task for sentence encoder aiming at improving QE performances at the sentence and word levels. Our approach is motivated by a problem inherent to QE: mistakes in translation caused by wrongly inserted and deleted tokens. We modify the translation language model (TLM) training objective of the cross-lingual language model (XLM) to orientate the pre-trained model towards the target task. The proposed method does not rely on annotated data and is complementary to QE methods involving pre-trained sentence encoders and domain adaptation. Experiments on English-to-German and English-to-Russian translation directions show that intermediate learning improves over domain adaptated models. Additionally, our method reaches results in par with state-of-the-art QE models without requiring the combination of several approaches and outperforms similar methods based on pre-trained sentence encoders.

pdf bib abs
SJTU-NICT’s Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task
Zuchao Li | Hai Zhao | Rui Wang | Kehai Chen | Masao Utiyama | Eiichiro Sumita
Proceedings of the Fifth Conference on Machine Translation

In this paper, we introduced our joint team SJTU-NICT ‘s participation in the WMT 2020 machine translation shared task. In this shared task, we participated in four translation directions of three language pairs: English-Chinese, English-Polish on supervised machine translation track, German-Upper Sorbian on low-resource and unsupervised machine translation tracks. Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques: document-enhanced NMT, XLM pre-trained language model enhanced NMT, bidirectional translation as a pre-training, reference language based UNMT, data-dependent gaussian prior objective, and BT-BLEU collaborative filtering self-training. We also used the TF-IDF algorithm to filter the training set to obtain a domain more similar set with the test set for finetuning. In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.

pdf bib abs
Transformer-based Double-token Bidirectional Autoregressive Decoding in Neural Machine Translation
Kenji Imamura | Eiichiro Sumita
Proceedings of the 7th Workshop on Asian Translation

This paper presents a simple method that extends a standard Transformer-based autoregressive decoder, to speed up decoding. The proposed method generates a token from the head and tail of a sentence (two tokens in total) in each step. By simultaneously generating multiple tokens that rarely depend on each other, the decoding speed is increased while the degradation in translation quality is minimized. In our experiments, the proposed method increased the translation speed by around 113%-155% in comparison with a standard autoregressive decoder, while degrading the BLEU scores by no more than 1.03. It was faster than an iterative non-autoregressive decoder in many conditions.

2019

In this paper, we describe our supervised neural machine translation (NMT) systems that we developed for the news translation task for Kazakh↔English, Gujarati↔English, Chinese↔English, and English→Finnish translation directions. We focused on leveraging multilingual transfer learning and back-translation for the extremely low-resource language pairs: Kazakh↔English and Gujarati↔English translation. For the Chinese↔English translation, we used the provided parallel data augmented with a large quantity of back-translated monolingual data to train state-of-the-art NMT systems. We then employed techniques that have been proven to be most effective, such as back-translation, fine-tuning, and model ensembling, to generate the primary submissions of Chinese↔English. For English→Finnish, our submission from WMT18 remains a strong baseline despite the increase in parallel corpora for this year’s task.

pdf bib abs
NICT’s Unsupervised Neural and Statistical Machine Translation Systems for the WMT19 News Translation Task
Benjamin Marie | Haipeng Sun | Rui Wang | Kehai Chen | Atsushi Fujita | Masao Utiyama | Eiichiro Sumita
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

This paper presents the NICT’s participation in the WMT19 unsupervised news translation task. We participated in the unsupervised translation direction: German-Czech. Our primary submission to the task is the result of a simple combination of our unsupervised neural and statistical machine translation systems. Our system is ranked first for the German-to-Czech translation task, using only the data provided by the organizers (“constraint’”), according to both BLEU-cased and human evaluation. We also performed contrastive experiments with other language pairs, namely, English-Gujarati and English-Kazakh, to better assess the effectiveness of unsupervised machine translation in for distant language pairs and in truly low-resource conditions.

pdf bib abs
NICT’s Supervised Neural Machine Translation Systems for the WMT19 Translation Robustness Task
Raj Dabre | Eiichiro Sumita
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

In this paper we describe our neural machine translation (NMT) systems for Japanese↔English translation which we submitted to the translation robustness task. We focused on leveraging transfer learning via fine tuning to improve translation quality. We used a fairly well established domain adaptation technique called Mixed Fine Tuning (MFT) (Chu et. al., 2017) to improve translation quality for Japanese↔English. We also trained bi-directional NMT models instead of uni-directional ones as the former are known to be quite robust, especially in low-resource scenarios. However, given the noisy nature of the in-domain training data, the improvements we obtained are rather modest.

pdf bib
Online Sentence Segmentation for Simultaneous Interpretation using Multi-Shifted Recurrent Neural Network
Xiaolin Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of Machine Translation Summit XVII: Research Track

pdf bib
Hybrid Data-Model Parallel Training for Sequence-to-Sequence Recurrent Neural Network Machine Translation
Junya Ono | Masao Utiyama | Eiichiro Sumita
Proceedings of the 8th Workshop on Patent and Scientific Literature Translation

pdf bib abs
Recurrent Positional Embedding for Neural Machine Translation
Kehai Chen | Rui Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

In the Transformer network architecture, positional embeddings are used to encode order dependencies into the input representation. However, this input representation only involves static order dependencies based on discrete numerical information, that is, are independent of word content. To address this issue, this work proposes a recurrent positional embedding approach based on word vector. In this approach, these recurrent positional embeddings are learned by a recurrent neural network, encoding word content-based order dependencies into the input representation. They are then integrated into the existing multi-head self-attention model as independent heads or part of each head. The experimental results revealed that the proposed approach improved translation performance over that of the state-of-the-art Transformer baseline in WMT’14 English-to-German and NIST Chinese-to-English translation tasks.

pdf bib abs
MY-AKKHARA: A Romanization-based Burmese (Myanmar) Input Method
Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

MY-AKKHARA is a method used to input Burmese texts encoded in the Unicode standard, based on commonly accepted Latin transcription. By using this method, arbitrary Burmese strings can be accurately inputted with 26 lowercase Latin letters. Meanwhile, the 26 uppercase Latin letters are designed as shortcuts of lowercase letter sequences. The frequency of Burmese characters is considered in MY-AKKHARA to realize an efficient keystroke distribution on a QWERTY keyboard. Given that the Unicode standard has not been extensively used in digitization of Burmese, we hope that MY-AKKHARA can contribute to the widespread use of Unicode in Myanmar and can provide a platform for smart input methods for Burmese in the future. An implementation of MY-AKKHARA running in Windows is released at http://www2.nict.go.jp/astrec-att/member/ding/my-akkhara.html

This paper presents the NICT’s supervised and unsupervised machine translation systems for the WAT2019 Myanmar-English and Khmer-English translation tasks. For all the translation directions, we built state-of-the-art supervised neural (NMT) and statistical (SMT) machine translation systems, using monolingual data cleaned and normalized. Our combination of NMT and SMT performed among the best systems for the four translation directions. We also investigated the feasibility of unsupervised machine translation for low-resource and distant language pairs and confirmed observations of previous work showing that unsupervised MT is still largely unable to deal with them.

pdf bib abs
NICT’s participation to WAT 2019: Multilingualism and Multi-step Fine-Tuning for Low Resource NMT
Raj Dabre | Eiichiro Sumita
Proceedings of the 6th Workshop on Asian Translation

In this paper we describe our submissions to WAT 2019 for the following tasks: English–Tamil translation and Russian–Japanese translation. Our team,“NICT-5”, focused on multilingual domain adaptation and back-translation for Russian–Japanese translation and on simple fine-tuning for English–Tamil translation . We noted that multi-stage fine tuning is essential in leveraging the power of multilingualism for an extremely low-resource language like Russian–Japanese. Furthermore, we can improve the performance of such a low-resource language pair by exploiting a small but in-domain monolingual corpus via back-translation. We managed to obtain second rank in both tasks for all translation directions.

This paper presents the NICT’s participation (team ID: NICT) in the 6th Workshop on Asian Translation (WAT-2019) shared translation task, specifically Myanmar (Burmese) - English task in both translation directions. We built neural machine translation (NMT) systems for these tasks. Our NMT systems were trained with language model pretraining. Back-translation technology is adopted to NMT. Our NMT systems rank the third in English-to-Myanmar and the second in Myanmar-to-English according to BLEU score.

pdf bib abs
Long Warm-up and Self-Training: Training Strategies of NICT-2 NMT System at WAT-2019
Kenji Imamura | Eiichiro Sumita
Proceedings of the 6th Workshop on Asian Translation

This paper describes the NICT-2 neural machine translation system at the 6th Workshop on Asian Translation. This system employs the standard Transformer model but features the following two characteristics. One is the long warm-up strategy, which performs a longer warm-up of the learning rate at the start of the training than conventional approaches. Another is that the system introduces self-training approaches based on multiple back-translations generated by sampling. We participated in three tasks—ASPEC.en-ja, ASPEC.ja-en, and TDDC.ja-en—using this system.

pdf bib abs
Recycling a Pre-trained BERT Encoder for Neural Machine Translation
Kenji Imamura | Eiichiro Sumita
Proceedings of the 3rd Workshop on Neural Generation and Translation

In this paper, a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model is applied to Transformer-based neural machine translation (NMT). In contrast to monolingual tasks, the number of unlearned model parameters in an NMT decoder is as huge as the number of learned parameters in the BERT model. To train all the models appropriately, we employ two-stage optimization, which first trains only the unlearned parameters by freezing the BERT model, and then fine-tunes all the sub-models. In our experiments, stable two-stage optimization was achieved, in contrast the BLEU scores of direct fine-tuning were extremely low. Consequently, the BLEU scores of the proposed method were better than those of the Transformer base model and the same model without pre-training. Additionally, we confirmed that NMT with the BERT encoder is more effective in low-resource settings.

pdf bib abs
Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation
Haipeng Sun | Rui Wang | Kehai Chen | Masao Utiyama | Eiichiro Sumita | Tiejun Zhao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Unsupervised bilingual word embedding (UBWE), together with other technologies such as back-translation and denoising, has helped unsupervised neural machine translation (UNMT) achieve remarkable results in several language pairs. In previous methods, UBWE is first trained using non-parallel monolingual corpora and then this pre-trained UBWE is used to initialize the word embedding in the encoder and decoder of UNMT. That is, the training of UBWE and UNMT are separate. In this paper, we first empirically investigate the relationship between UBWE and UNMT. The empirical findings show that the performance of UNMT is significantly affected by the performance of UBWE. Thus, we propose two methods that train UNMT with UBWE agreement. Empirical results on several language pairs show that the proposed methods significantly outperform conventional UNMT.

pdf bib abs
Neural Machine Translation with Reordering Embeddings
Kehai Chen | Rui Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

The reordering model plays an important role in phrase-based statistical machine translation. However, there are few works that exploit the reordering information in neural machine translation. In this paper, we propose a reordering mechanism to learn the reordering embedding of a word based on its contextual information. These learned reordering embeddings are stacked together with self-attention networks to learn sentence representation for machine translation. The reordering mechanism can be easily integrated into both the encoder and the decoder in the Transformer translation system. Experimental results on WMT’14 English-to-German, NIST Chinese-to-English, and WAT Japanese-to-English translation tasks demonstrate that the proposed methods can significantly improve the performance of the Transformer.

The training objective of neural machine translation (NMT) is to minimize the loss between the words in the translated sentences and those in the references. In NMT, there is a natural correspondence between the source sentence and the target sentence. However, this relationship has only been represented using the entire neural network and the training objective is computed in word-level. In this paper, we propose a sentence-level agreement module to directly minimize the difference between the representation of source and target sentence. The proposed agreement module can be integrated into NMT as an additional training objective function and can also be used to enhance the representation of the source sentences. Empirical results on the NIST Chinese-to-English and WMT English-to-German tasks show the proposed agreement module can significantly improve the NMT performance.

pdf bib abs
Improving Neural Machine Translation with Neural Syntactic Distance
Chunpeng Ma | Akihiro Tamura | Masao Utiyama | Eiichiro Sumita | Tiejun Zhao
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The explicit use of syntactic information has been proved useful for neural machine translation (NMT). However, previous methods resort to either tree-structured neural networks or long linearized sequences, both of which are inefficient. Neural syntactic distance (NSD) enables us to represent a constituent tree using a sequence whose length is identical to the number of words in the sentence. NSD has been used for constituent parsing, but not in machine translation. We propose five strategies to improve NMT with NSD. Experiments show that it is not trivial to improve NMT with NSD; however, the proposed strategies are shown to improve translation performance of the baseline model (+2.1 (En–Ja), +1.3 (Ja–En), +1.2 (En–Ch), and +1.0 (Ch–En) BLEU).

pdf bib abs
Incorporating Word Attention into Character-Based Word Segmentation
Shohei Higashiyama | Masao Utiyama | Eiichiro Sumita | Masao Ideuchi | Yoshiaki Oida | Yohei Sakamoto | Isaac Okada
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Neural network models have been actively applied to word segmentation, especially Chinese, because of the ability to minimize the effort in feature engineering. Typical segmentation models are categorized as character-based, for conducting exact inference, or word-based, for utilizing word-level information. We propose a character-based model utilizing word information to leverage the advantages of both types of models. Our model learns the importance of multiple candidate words for a character on the basis of an attention mechanism, and makes use of it for segmentation decisions. The experimental results show that our model achieves better performance than the state-of-the-art models on both Japanese and Chinese benchmark datasets.

pdf bib abs
SJTU-NICT at MRP 2019: Multi-Task Learning for End-to-End Uniform Semantic Graph Parsing
Zuchao Li | Hai Zhao | Zhuosheng Zhang | Rui Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

This paper describes our SJTU-NICT’s system for participating in the shared task on Cross-Framework Meaning Representation Parsing (MRP) at the 2019 Conference for Computational Language Learning (CoNLL). Our system uses a graph-based approach to model a variety of semantic graph parsing tasks. Our main contributions in the submitted system are summarized as follows: 1. Our model is fully end-to-end and is capable of being trained only on the given training set which does not rely on any other extra training source including the companion data provided by the organizer; 2. We extend our graph pruning algorithm to a variety of semantic graphs, solving the problem of excessive semantic graph search space; 3. We introduce multi-task learning for multiple objectives within the same framework. The evaluation results show that our system achieved second place in the overall F₁ score and achieved the best F₁ score on the DM framework.

2018

pdf bib abs
Enhancement of Encoder and Attention Using Target Monolingual Corpora in Neural Machine Translation
Kenji Imamura | Atsushi Fujita | Eiichiro Sumita
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

A large-scale parallel corpus is required to train encoder-decoder neural machine translation. The method of using synthetic parallel texts, in which target monolingual corpora are automatically translated into source sentences, is effective in improving the decoder, but is unreliable for enhancing the encoder. In this paper, we propose a method that enhances the encoder and attention using target monolingual corpora by generating multiple source sentences via sampling. By using multiple source sentences, diversity close to that of humans is achieved. Our experimental results show that the translation quality is improved by increasing the number of synthetic source sentences for each given target sentence, and quality close to that using a manually created parallel corpus was achieved.

pdf bib abs
NICT Self-Training Approach to Neural Machine Translation at NMT-2018
Kenji Imamura | Eiichiro Sumita
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

This paper describes the NICT neural machine translation system submitted at the NMT-2018 shared task. A characteristic of our approach is the introduction of self-training. Since our self-training does not change the model structure, it does not influence the efficiency of translation, such as the translation speed. The experimental results showed that the translation quality improved not only in the sequence-to-sequence (seq-to-seq) models but also in the transformer models.

pdf bib abs
NICT’s Neural and Statistical Machine Translation Systems for the WMT18 News Translation Task
Benjamin Marie | Rui Wang | Atsushi Fujita | Masao Utiyama | Eiichiro Sumita
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper presents the NICT’s participation to the WMT18 shared news translation task. We participated in the eight translation directions of four language pairs: Estonian-English, Finnish-English, Turkish-English and Chinese-English. For each translation direction, we prepared state-of-the-art statistical (SMT) and neural (NMT) machine translation systems. Our NMT systems were trained with the transformer architecture using the provided parallel data enlarged with a large quantity of back-translated monolingual data that we generated with a new incremental training framework. Our primary submissions to the task are the result of a simple combination of our SMT and NMT systems. Our systems are ranked first for the Estonian-English and Finnish-English language pairs (constraint) according to BLEU-cased.

pdf bib abs
NICT’s Corpus Filtering Systems for the WMT18 Parallel Corpus Filtering Task
Rui Wang | Benjamin Marie | Masao Utiyama | Eiichiro Sumita
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper presents the NICT’s participation in the WMT18 shared parallel corpus filtering task. The organizers provided 1 billion words German-English corpus crawled from the web as part of the Paracrawl project. This corpus is too noisy to build an acceptable neural machine translation (NMT) system. Using the clean data of the WMT18 shared news translation task, we designed several features and trained a classifier to score each sentence pairs in the noisy data. Finally, we sampled 100 million and 10 million words and built corresponding NMT systems. Empirical results show that our NMT systems trained on sampled data achieve promising performance.

pdf bib abs
Exploring Recombination for Efficient Decoding of Neural Machine Translation
Zhisong Zhang | Rui Wang | Masao Utiyama | Eiichiro Sumita | Hai Zhao
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

In Neural Machine Translation (NMT), the decoder can capture the features of the entire prediction history with neural connections and representations. This means that partial hypotheses with different prefixes will be regarded differently no matter how similar they are. However, this might be inefficient since some partial hypotheses can contain only local differences that will not influence future predictions. In this work, we introduce recombination in NMT decoding based on the concept of the “equivalence” of partial hypotheses. Heuristically, we use a simple n-gram suffix based equivalence function and adapt it into beam search decoding. Through experiments on large-scale Chinese-to-English and English-to-Germen translation tasks, we show that the proposed method can obtain similar translation quality with a smaller beam size, making NMT decoding more efficient.

pdf bib abs
CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++
Xiaolin Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

This paper presents an open-source neural machine translation toolkit named CytonMT. The toolkit is built from scratch only using C++ and NVIDIA’s GPU-accelerated libraries. The toolkit features training efficiency, code simplicity and translation quality. Benchmarks show that cytonMT accelerates the training speed by 64.5% to 110.8% on neural networks of various sizes, and achieves competitive translation quality.

pdf bib abs
Guiding Neural Machine Translation with Retrieved Translation Pieces
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

One of the difficulties of neural machine translation (NMT) is the recall and appropriate translation of low-frequency words or phrases. In this paper, we propose a simple, fast, and effective method for recalling previously seen translation examples and incorporating them into the NMT decoding process. Specifically, for an input sentence, we use a search engine to retrieve sentence pairs whose source sides are similar with the input sentence, and then collect n-grams that are both in the retrieved target sentences and aligned with words that match in the source sentences, which we call “translation pieces”. We compute pseudo-probabilities for each retrieved sentence based on similarities between the input sentence and the retrieved source sentences, and use these to weight the retrieved translation pieces. Finally, an existing NMT model is used to translate the input sentence, with an additional bonus given to outputs that contain the collected translation pieces. We show our method improves NMT translation results up to 6 BLEU points on three narrow domain translation tasks where repetitiveness of the target sentences is particularly salient. It also causes little increase in the translation time, and compares favorably to another alternative retrieval-based method with respect to accuracy, speed, and simplicity of implementation.

pdf bib
Multilingual Parallel Corpus for Global Communication Plan
Kenji Imamura | Eiichiro Sumita
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
NICT’s Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers
Raj Dabre | Anoop Kunchukuttan | Atsushi Fujita | Eiichiro Sumita
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

pdf bib
English-Myanmar NMT and SMT with Pre-ordering: NICT’s Machine Translation Systems at WAT-2018
Rui Wang | Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

pdf bib
Combination of Statistical and Neural Machine Translation for Myanmar-English
Benjamin Marie | Atsushi Fujita | Eiichiro Sumita
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

pdf bib abs
Forest-Based Neural Machine Translation
Chunpeng Ma | Akihiro Tamura | Masao Utiyama | Tiejun Zhao | Eiichiro Sumita
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Tree-based neural machine translation (NMT) approaches, although achieved impressive performance, suffer from a major drawback: they only use the 1-best parse tree to direct the translation, which potentially introduces translation mistakes due to parsing errors. For statistical machine translation (SMT), forest-based methods have been proven to be effective for solving this problem, while for NMT this kind of approach has not been attempted. This paper proposes a forest-based NMT method that translates a linearized packed forest under a simple sequence-to-sequence framework (i.e., a forest-to-sequence NMT model). The BLEU score of the proposed method is higher than that of the sequence-to-sequence NMT, tree-based NMT, and forest-based SMT systems.

pdf bib abs
Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation
Rui Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Traditional Neural machine translation (NMT) involves a fixed training procedure where each sentence is sampled once during each epoch. In reality, some sentences are well-learned during the initial few epochs; however, using this approach, the well-learned sentences would continue to be trained along with those sentences that were not well learned for 10-30 epochs, which results in a wastage of time. Here, we propose an efficient method to dynamically sample the sentences in order to accelerate the NMT training. In this approach, a weight is assigned to each sentence based on the measured difference between the training costs of two iterations. Further, in each epoch, a certain percentage of sentences are dynamically sampled according to their weights. Empirical results based on the NIST Chinese-to-English and the WMT English-to-German tasks show that the proposed method can significantly accelerate the NMT training and improve the NMT performance.

pdf bib abs
Simplified Abugidas
Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

An abugida is a writing system where the consonant letters represent syllables with a default vowel and other vowels are denoted by diacritics. We investigate the feasibility of recovering the original text written in an abugida after omitting subordinate diacritics and merging consonant letters with similar phonetic values. This is crucial for developing more efficient input methods by reducing the complexity in abugidas. Four abugidas in the southern Brahmic family, i.e., Thai, Burmese, Khmer, and Lao, were studied using a newswire 20,000-sentence dataset. We compared the recovery performance of a support vector machine and an LSTM-based recurrent neural network, finding that the abugida graphemes could be recovered with 94% - 97% accuracy at the top-1 level and 98% - 99% at the top-4 level, even after omitting most diacritics (10 - 30 types) and merging the remaining 30 - 50 characters into 21 graphemes.

2017

pdf bib abs
Context-Aware Smoothing for Neural Machine Translation
Kehai Chen | Rui Wang | Masao Utiyama | Eiichiro Sumita | Tiejun Zhao
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

In Neural Machine Translation (NMT), each word is represented as a low-dimension, real-value vector for encoding its syntax and semantic information. This means that even if the word is in a different sentence context, it is represented as the fixed vector to learn source representation. Moreover, a large number of Out-Of-Vocabulary (OOV) words, which have different syntax and semantic information, are represented as the same vector representation of “unk”. To alleviate this problem, we propose a novel context-aware smoothing method to dynamically learn a sentence-specific vector for each word (including OOV words) depending on its local context words in a sentence. The learned context-aware representation is integrated into the NMT to improve the translation performance. Empirical results on NIST Chinese-to-English translation task show that the proposed approach achieves 1.78 BLEU improvements on average over a strong attentional NMT, and outperforms some existing systems.

pdf bib abs
Improving Neural Machine Translation through Phrase-based Forced Decoding
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Compared to traditional statistical machine translation (SMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency. We propose a method to combine the advantages of traditional SMT and NMT by exploiting an existing phrase-based SMT model to compute the phrase-based decoding cost for an NMT output and then using the phrase-based decoding cost to rerank the n-best NMT outputs. The main challenge in implementing this approach is that NMT outputs may not be in the search space of the standard phrase-based decoding algorithm, because the search space of phrase-based SMT is limited by the phrase-based translation rule table. We propose a soft forced decoding algorithm, which can always successfully find a decoding path for any NMT output. We show that using the forced decoding cost to rerank the NMT outputs can successfully improve translation quality on four different language pairs.

pdf bib abs
Key-value Attention Mechanism for Neural Machine Translation
Hideya Mino | Masao Utiyama | Eiichiro Sumita | Takenobu Tokunaga
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

In this paper, we propose a neural machine translation (NMT) with a key-value attention mechanism on the source-side encoder. The key-value attention mechanism separates the source-side content vector into two types of memory known as the key and the value. The key is used for calculating the attention distribution, and the value is used for encoding the context representation. Experiments on three different tasks indicate that our model outperforms an NMT model with a conventional attention mechanism. Furthermore, we perform experiments with a conventional NMT framework, in which a part of the initial value of a weight matrix is set to zero so that the matrix is as the same initial-state as the key-value attention mechanism. As a result, we obtain comparable results with the key-value attention mechanism without changing the network structure.

pdf bib abs
Instance Weighting for Neural Machine Translation Domain Adaptation
Rui Wang | Masao Utiyama | Lemao Liu | Kehai Chen | Eiichiro Sumita
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Instance weighting has been widely applied to phrase-based machine translation domain adaptation. However, it is challenging to be applied to Neural Machine Translation (NMT) directly, because NMT is not a linear model. In this paper, two instance weighting technologies, i.e., sentence weighting and domain weighting with a dynamic weight learning strategy, are proposed for NMT domain adaptation. Empirical results on the IWSLT English-German/French tasks show that the proposed methods can substantially improve NMT performance by up to 2.7-6.7 BLEU points, outperforming the existing baselines by up to 1.6-3.6 BLEU points.

Source dependency information has been successfully introduced into statistical machine translation. However, there are only a few preliminary attempts for Neural Machine Translation (NMT), such as concatenating representations of source word and its dependency label together. In this paper, we propose a novel NMT with source dependency representation to improve translation performance of NMT, especially long sentences. Empirical results on NIST Chinese-to-English translation task show that our method achieves 1.6 BLEU improvements on average over a strong NMT system.

pdf bib
Empirical Study of Dropout Scheme for Neural Machine Translation
Xiaolin Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of Machine Translation Summit XVI: Research Track

pdf bib
A Target Attention Model for Neural Machine Translation
Hideya Mino | Andrew Finch | Eiichiro Sumita
Proceedings of Machine Translation Summit XVI: Research Track

pdf bib abs
Sentence Embedding for Neural Machine Translation Domain Adaptation
Rui Wang | Andrew Finch | Masao Utiyama | Eiichiro Sumita
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Although new corpora are becoming increasingly available for machine translation, only those that belong to the same or similar domains are typically able to improve translation performance. Recently Neural Machine Translation (NMT) has become prominent in the field. However, most of the existing domain adaptation methods only focus on phrase-based machine translation. In this paper, we exploit the NMT’s internal embedding of the source sentence and use the sentence embedding similarity to select the sentences which are close to in-domain data. The empirical adaptation results on the IWSLT English-French and NIST Chinese-English tasks show that the proposed methods can substantially improve NMT performance by 2.4-9.0 BLEU points, outperforming the existing state-of-the-art baseline by 2.3-4.5 BLEU points.

pdf bib
NICT-NAIST System for WMT17 Multimodal Translation Task
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the Second Conference on Machine Translation

pdf bib abs
Japanese to English/Chinese/Korean Datasets for Translation Quality Estimation and Automatic Post-Editing
Atsushi Fujita | Eiichiro Sumita
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

Aiming at facilitating the research on quality estimation (QE) and automatic post-editing (APE) of machine translation (MT) outputs, especially for those among Asian languages, we have created new datasets for Japanese to English, Chinese, and Korean translations. As the source text, actual utterances in Japanese were extracted from the log data of our speech translation service. MT outputs were then given by phrase-based statistical MT systems. Finally, human evaluators were employed to grade the quality of MT outputs and to post-edit them. This paper describes the characteristics of the created datasets and reports on our benchmarking experiments on word-level QE, sentence-level QE, and APE conducted using the created datasets.

pdf bib abs
Ensemble and Reranking: Using Multiple Models in the NICT-2 Neural Machine Translation System at WAT2017
Kenji Imamura | Eiichiro Sumita
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

In this paper, we describe the NICT-2 neural machine translation system evaluated at WAT2017. This system uses multiple models as an ensemble and combines models with opposite decoding directions by reranking (called bi-directional reranking). In our experimental results on small data sets, the translation quality improved when the number of models was increased to 32 in total and did not saturate. In the experiments on large data sets, improvements of 1.59-3.32 BLEU points were achieved when six-model ensembles were combined by the bi-directional reranking.

pdf bib abs
A Simple and Strong Baseline: NAIST-NICT Neural Machine Translation System for WAT2017 English-Japanese Translation Task
Yusuke Oda | Katsuhito Sudoh | Satoshi Nakamura | Masao Utiyama | Eiichiro Sumita
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

This paper describes the details about the NAIST-NICT machine translation system for WAT2017 English-Japanese Scientific Paper Translation Task. The system consists of a language-independent tokenizer and an attentional encoder-decoder style neural machine translation model. According to the official results, our system achieves higher translation accuracy than any systems submitted previous campaigns despite simple model architecture.

2016

pdf bib
Target-Bidirectional Neural Models for Machine Transliteration
Andrew Finch | Lemao Liu | Xiaolin Wang | Eiichiro Sumita
Proceedings of the Sixth Named Entity Workshop

pdf bib abs
Global Pre-ordering for Improving Sublanguage Translation
Masaru Fuji | Masao Utiyama | Eiichiro Sumita | Yuji Matsumoto
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

When translating formal documents, capturing the sentence structure specific to the sublanguage is extremely necessary to obtain high-quality translations. This paper proposes a novel global reordering method with particular focus on long-distance reordering for capturing the global sentence structure of a sublanguage. The proposed method learns global reordering models from a non-annotated parallel corpus and works in conjunction with conventional syntactic reordering. Experimental results on the patent abstract sublanguage show substantial gains of more than 25 points in the RIBES metric and comparable BLEU scores both for Japanese-to-English and English-to-Japanese translations.

pdf bib abs
NICT-2 Translation System for WAT2016: Applying Domain Adaptation to Phrase-based Statistical Machine Translation
Kenji Imamura | Eiichiro Sumita
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper describes the NICT-2 translation system for the 3rd Workshop on Asian Translation. The proposed system employs a domain adaptation method based on feature augmentation. We regarded the Japan Patent Office Corpus as a mixture of four domain corpora and improved the translation quality of each domain. In addition, we incorporated language models constructed from Google n-grams as external knowledge. Our domain adaptation method can naturally incorporate such external knowledge that contributes to translation quality.

pdf bib abs
An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation
Xiaolin Wang | Andrew Finch | Masao Utiyama | Eiichiro Sumita
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

Simultaneous interpretation is a very challenging application of machine translation in which the input is a stream of words from a speech recognition engine. The key problem is how to segment the stream in an online manner into units suitable for translation. The segmentation process proceeds by calculating a confidence score for each word that indicates the soundness of placing a sentence boundary after it, and then heuristics are employed to determine the position of the boundaries. Multiple variants of the confidence scoring method and segmentation heuristics were studied. Experimental results show that the best performing strategy is not only efficient in terms of average latency per word, but also achieved end-to-end translation quality close to an offline baseline, and close to oracle segmentation.

pdf bib abs
Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian
Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank. The cross-lingual similarity is investigated and demonstrated on metrics of correspondence and order of tokens, based on several standard statistical machine translation techniques. The similarity shown in this study suggests a possibility on harmonious annotation and processing of the language pairs in future development.

pdf bib
Unsupervised Word Alignment by Agreement Under ITG Constraint
Hidetaka Kamigaito | Akihiro Tamura | Hiroya Takamura | Manabu Okumura | Eiichiro Sumita
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib abs
Neural Machine Translation with Supervised Attention
Lemao Liu | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

The attention mechanism is appealing for neural machine translation, since it is able to dynamically encode a source sentence by generating a alignment between a target word and source words. Unfortunately, it has been proved to be worse than conventional alignment models in alignment accuracy. In this paper, we analyze and explain this issue from the point view of reordering, and propose a supervised attention which is learned with guidance from conventional alignment models. Experiments on two Chinese-to-English translation tasks show that the supervised attention mechanism yields better alignments leading to substantial gains over the standard attention based NMT.

pdf bib abs
Connecting Phrase based Statistical Machine Translation Adaptation
Rui Wang | Hai Zhao | Bao-Liang Lu | Masao Utiyama | Eiichiro Sumita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Although more additional corpora are now available for Statistical Machine Translation (SMT), only the ones which belong to the same or similar domains of the original corpus can indeed enhance SMT performance directly. A series of SMT adaptation methods have been proposed to select these similar-domain data, and most of them focus on sentence selection. In comparison, phrase is a smaller and more fine grained unit for data selection, therefore we propose a straightforward and efficient connecting phrase based adaptation method, which is applied to both bilingual phrase pair and monolingual n-gram adaptation. The proposed method is evaluated on IWSLT/NIST data sets, and the results show that phrase based SMT performances are significantly improved (up to +1.6 in comparison with phrase based SMT baseline system and +0.9 in comparison with existing methods).

pdf bib abs
A Prototype Automatic Simultaneous Interpretation System
Xiaolin Wang | Andrew Finch | Masao Utiyama | Eiichiro Sumita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

Simultaneous interpretation allows people to communicate spontaneously across language boundaries, but such services are prohibitively expensive for the general public. This paper presents a fully automatic simultaneous interpretation system to address this problem. Though the development is still at an early stage, the system is capable of keeping up with the fastest of the TED speakers while at the same time delivering high-quality translations. We believe that the system will become an effective tool for facilitating cross-lingual communication in the future.

pdf bib abs
MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation
Rei Miyata | Anthony Hartley | Kyo Kageura | Cécile Paris | Masao Utiyama | Eiichiro Sumita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

The paper introduces a web-based authoring support system, MuTUAL, which aims to help writers create multilingual texts. The highlighted feature of the system is that it enables machine translation (MT) to generate outputs appropriate to their functional context within the target document. Our system is operational online, implementing core mechanisms for document structuring and controlled writing. These include a topic template and a controlled language authoring assistant, linked to our statistical MT system.

pdf bib abs
Multi-domain Adaptation for Statistical Machine Translation Based on Feature Augmentation
Kenji Imamura | Eiichiro Sumita
Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track

Domain adaptation is a major challenge when applying machine translation to practical tasks. In this paper, we present domain adaptation methods for machine translation that assume multiple domains. The proposed methods combine two model types: a corpus-concatenated model covering multiple domains and single-domain models that are accurate but sparse in specific domains. We combine the advantages of both models using feature augmentation for domain adaptation in machine learning. Our experimental results show that the BLEU scores of the proposed method clearly surpass those of single-domain models for low-resource domains. For high-resource domains, the scores of the proposed method were superior to those of both single-domain and corpusconcatenated models. Even in domains having a million bilingual sentences, the translation quality was at least preserved and even improved in some domains. These results demonstrate that state-of-the-art domain adaptation can be realized with appropriate settings, even when using standard log-linear models.

pdf bib
Agreement on Target-bidirectional Neural Machine Translation
Lemao Liu | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Interlocking Phrases in Phrase-based Statistical Machine Translation
Ye Kyaw Thu | Andrew Finch | Eiichiro Sumita
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Bilingual Segmented Topic Model
Akihiro Tamura | Eiichiro Sumita
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Continuous Space Rule Selection Model for Syntax-based Statistical Machine Translation
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib abs
Introducing the Asian Language Treebank (ALT)
Ye Kyaw Thu | Win Pa Pa | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper introduces the ALT project initiated by the Advanced Speech Translation Research and Development Promotion Center (ASTREC), NICT, Kyoto, Japan. The aim of this project is to accelerate NLP research for Asian languages such as Indonesian, Japanese, Khmer, Laos, Malay, Myanmar, Philippine, Thai and Vietnamese. The original resource for this project was English articles that were randomly selected from Wikinews. The project has so far created a corpus for Myanmar and will extend in scope to include other languages in the near future. A 20000-sentence corpus of Myanmar that has been manually translated from an English corpus has been word segmented, word aligned, part-of-speech tagged and constituency parsed by human annotators. In this paper, we present the implementation steps for creating the treebank in detail, including a description of the ALT web-based treebanking tool. Moreover, we report statistics on the annotation quality of the Myanmar treebank created so far.

In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists of a Japanese-English scientific paper abstract corpus of approximately 3 million parallel sentences (ASPEC-JE) and a Chinese-Japanese scientific paper excerpt corpus of approximately 0.68 million parallel sentences (ASPEC-JC). ASPEC is used as the official dataset for the machine translation evaluation workshop WAT (Workshop on Asian Translation).

2015

pdf bib
Patent claim translation based on sublanguage-specific sentence structure
Masaru Fuji | Atsushi Fujita | Masao Utiyama | Eiichiro Sumita | Yuji Matsumoto
Proceedings of Machine Translation Summit XV: Papers

pdf bib
Learning bilingual phrase representations with recurrent neural networks
Hideya Mino | Andrew Finch | Eiichiro Sumita
Proceedings of Machine Translation Summit XV: Papers

pdf bib
Improving fast_align by Reordering
Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Hierarchical Phrase-based Stream Decoding
Andrew Finch | Xiaolin Wang | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model
Hidetaka Kamigaito | Taro Watanabe | Hiroya Takamura | Manabu Okumura | Eiichiro Sumita
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Leave-one-out Word Alignment without Garbage Collector Effects
Xiaolin Wang | Masao Utiyama | Andrew Finch | Taro Watanabe | Eiichiro Sumita
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Binarized Neural Network Joint Model for Machine Translation
Jingyi Zhang | Masao Utiyama | Eiichiro Sumita | Graham Neubig | Satoshi Nakamura
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Transition-based Neural Constituent Parsing
Taro Watanabe | Eiichiro Sumita
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
Learning Word Reorderings for Hierarchical Phrase-based Statistical Machine Translation
Jingyi Zhang | Masao Utiyama | Eiichro Sumita | Hai Zhao
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Neural Network Transduction Models in Transliteration Generation
Andrew Finch | Lemao Liu | Xiaolin Wang | Eiichiro Sumita
Proceedings of the Fifth Named Entity Workshop

pdf bib
NICT at WAT 2015
Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2nd Workshop on Asian Translation (WAT2015)

pdf bib
Risk-aware distribution of SMT outputs for translation of documents targeting many anonymous readers
Yo Ehara | Masao Utiyama | Eiichiro Sumita
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers

pdf bib
A Large-scale Study of Statistical Machine Translation Methods for Khmer Language
Ye Kyaw Thu | Vichet Chea | Andrew Finch | Masao Utiyama | Eiichiro Sumita
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

2014

pdf bib
Integrating Dictionaries into an Unsupervised Model for Myanmar Word Segmentation
Ye Kyaw Thu | Andrew Finch | Eiichiro Sumita | Yoshinori Sagisaka
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing

pdf bib
Proceedings of the 1st Workshop on Asian Translation (WAT2014)
Toshiaki Nakazawa | Hideya Mino | Isao Goto | Sadao Kurohashi | Eiichiro Sumita
Proceedings of the 1st Workshop on Asian Translation (WAT2014)

pdf bib
Overview of the 1st Workshop on Asian Translation
Toshiaki Nakazawa | Hideya Mino | Isao Goto | Sadao Kurohashi | Eiichiro Sumita
Proceedings of the 1st Workshop on Asian Translation (WAT2014)

pdf bib
Word Order Does NOT Differ Significantly Between Chinese and Japanese
Chenchen Ding | Masao Utiyama | Eiichiro Sumita | Mikio Yamamoto
Proceedings of the 1st Workshop on Asian Translation (WAT2014)

pdf bib
Recurrent Neural Networks for Word Alignment Model
Akihiro Tamura | Taro Watanabe | Eiichiro Sumita
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Dependency-based Pre-ordering for Chinese-English Machine Translation
Jingsheng Cai | Masao Utiyama | Eiichiro Sumita | Yujie Zhang
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora
Xiaolin Wang | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib abs
Document-level re-ranking with soft lexical and semantic features for statistical machine translation
Chenchen Ding | Masao Utiyama | Eiichiro Sumita
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track

We introduce two document-level features to polish baseline sentence-level translations generated by a state-of-the-art statistical machine translation (SMT) system. One feature uses the word-embedding technique to model the relation between a sentence and its context on the target side; the other feature is a crisp document-level token-type ratio of target-side translations for source-side words to model the lexical consistency in translation. The weights of introduced features are tuned to optimize the sentence- and document-level metrics simultaneously on the basis of Pareto optimality. Experimental results on two different schemes with different corpora illustrate that the proposed approach can efficiently and stably integrate document-level information into a sentence-level SMT system. The best improvements were approximately 0.5 BLEU on test sets with statistical significance.

pdf bib
Syntax-Augmented Machine Translation using Syntax-Label Clustering
Hideya Mino | Taro Watanabe | Eiichiro Sumita
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Learning Hierarchical Translation Spans
Jingyi Zhang | Masao Utiyama | Eiichiro Sumita | Hai Zhao
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Neural Network Based Bilingual Language Model Growing for Statistical Machine Translation
Rui Wang | Hai Zhao | Bao-Liang Lu | Masao Utiyama | Eiichiro Sumita
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Refining Word Segmentation Using a Manually Aligned Corpus for Statistical Machine Translation
Xiaolin Wang | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib abs
The NICT translation system for IWSLT 2014
Xiaolin Wang | Andrew Finch | Masao Utiyama | Taro Watanabe | Eiichiro Sumita
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes NICT’s participation in the IWSLT 2014 evaluation campaign for the TED Chinese-English translation shared-task. Our approach used a combination of phrase-based and hierarchical statistical machine translation (SMT) systems. Our focus was in several areas, specifically system combination, word alignment, and various language modeling techniques including the use of neural network joint models. Our experiments on the test set from the 2013 shared task, showed that an improvement in BLEU score can be gained in translation performance through all of these techniques, with the largest improvements coming from using large data sizes to train the language model.

pdf bib abs
Empircal dependency-based head finalization for statistical Chinese-, English-, and French-to-Myanmar (Burmese) machine translation
Chenchen Ding | Ye Kyaw Thu | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

We conduct dependency-based head finalization for statistical machine translation (SMT) for Myanmar (Burmese). Although Myanmar is an understudied language, linguistically it is a head-final language with similar syntax to Japanese and Korean. So, applying the efficient techniques of Japanese and Korean processing to Myanmar is a natural idea. Our approach is a combination of two approaches. The first is a head-driven phrase structure grammar (HPSG) based head finalization for English-to-Japanese translation, the second is dependency-based pre-ordering originally designed for English-to-Korean translation. We experiment on Chinese-, English-, and French-to-Myanmar translation, using a statistical pre-ordering approach as a comparison method. Experimental results show the dependency-based head finalization was able to consistently improve a baseline SMT system, for different source languages and different segmentation schemes for the Myanmar language.

pdf bib abs
An exploration of segmentation strategies in stream decoding
Andrew Finch | Xiaolin Wang | Eiichiro Sumita
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

In this paper we explore segmentation strategies for the stream decoder a method for decoding from a continuous stream of input tokens, rather than the traditional method of decoding from sentence segmented text. The behavior of the decoder is analyzed and modifications to the decoding algorithm are proposed to improve its performance. The experimental results show our proposed decoding strategies to be effective, and add support to the original findings that this approach is capable of approaching the performance of the underlying phrase-based machine translation decoder, at useful levels of latency. Our experiments evaluated the stream decoder on a broader set of language pairs than in previous work. We found most European language pairs were similar in character, and report results on English-Chinese and English-German pairs which are of interest due to the reordering required.

2013

pdf bib
Tuning SMT with a Large Number of Features via Online Feature Grouping
Lemao Liu | Tiejun Zhao | Taro Watanabe | Eiichiro Sumita
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation
Rui Wang | Masao Utiyama | Isao Goto | Eiichro Sumita | Hai Zhao | Bao-Liang Lu
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Inducing Romanization Systems
Keiko Taguchi | Andrew Finch | Seiichi Yamamoto | Eiichiro Sumita
Proceedings of Machine Translation Summit XIV: Papers

pdf bib
Distortion Model Considering Rich Context for Statistical Machine Translation
Isao Goto | Masao Utiyama | Eiichiro Sumita | Akihiro Tamura | Sadao Kurohashi
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Additive Neural Networks for Statistical Machine Translation
Lemao Liu | Taro Watanabe | Eiichiro Sumita | Tiejun Zhao
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Hierarchical Phrase Table Combination for Machine Translation
Conghui Zhu | Taro Watanabe | Eiichiro Sumita | Tiejun Zhao
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
Akihiro Tamura | Taro Watanabe | Eiichiro Sumita | Hiroya Takamura | Manabu Okumura
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Crowd-based MT Evaluation for non-English Target Languages
Michael Paul | Eiichiro Sumita | Luisa Bentivogli | Marcello Federico
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

pdf bib abs
Minimum Bayes-risk decoding extended with similar examples: NAIST-NCT at IWSLT 2012
Hiroaki Shimizu | Masao Utiyama | Eiichiro Sumita | Satoshi Nakamura
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes our methods used in the NAIST-NICT submission to the International Workshop on Spoken Language Translation (IWSLT) 2012 evaluation campaign. In particular, we propose two extensions to minimum bayes-risk decoding which reduces a expected loss.

pdf bib
The NICT translation system for IWSLT 2012
Andrew Finch | Ohnmar Htun | Eiichiro Sumita
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Rescoring a Phrase-based Machine Transliteration System with Recurrent Neural Network Language Models
Andrew Finch | Paul Dixon | Eiichiro Sumita
Proceedings of the 4th Named Entity Workshop (NEWS) 2012

pdf bib
Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation
Akihiro Tamura | Taro Watanabe | Eiichiro Sumita
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

pdf bib
Post-ordering by Parsing for Japanese-English Statistical Machine Translation
Isao Goto | Masao Utiyama | Eiichiro Sumita
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib abs
The NICT translation system for IWSLT 2011
Andrew Finch | Chooi-Ling Goh | Graham Neubig | Eiichiro Sumita
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes NICT’s participation in the IWSLT 2011 evaluation campaign for the TED speech translation ChineseEnglish shared-task. Our approach was based on a phrasebased statistical machine translation system that was augmented in two ways. Firstly we introduced rule-based re-ordering constraints on the decoding. This consisted of a set of rules that were used to segment the input utterances into segments that could be decoded almost independently. This idea here being that constraining the decoding process in this manner would greatly reduce the search space of the decoder, and cut out many possibilities for error while at the same time allowing for a correct output to be generated. The rules we used exploit punctuation and spacing in the input utterances, and we use these positions to delimit our segments. Not all punctuation/spacing positions were used as segment boundaries, and the set of used positions were determined by a set of linguistically-based heuristics. Secondly we used two heterogeneous methods to build the translation model, and lexical reordering model for our systems. The first method employed the popular method of using GIZA++ for alignment in combination with phraseextraction heuristics. The second method used a recentlydeveloped Bayesian alignment technique that is able to perform both phrase-to-phrase alignment and phrase pair extraction within a single unsupervised process. The models produced by this type of alignment technique are typically very compact whilst at the same time maintaining a high level of translation quality. We evaluated both of these methods of translation model construction in isolation, and our results show their performance is comparable. We also integrated both models by linear interpolation to obtain a model that outperforms either component. Finally, we added an indicator feature into the log-linear model to indicate those phrases that were in the intersection of the two translation models. The addition of this feature was also able to provide a small improvement in performance.

pdf bib abs
Annotating data selection for improving machine translation
Keiji Yasuda | Hideo Okuma | Masao Utiyama | Eiichiro Sumita
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

In order to efficiently improve machine translation systems, we propose a method which selects data to be annotated (manually translated) from speech-to-speech translation field data. For the selection experiments, we used data from field experiments conducted during the 2009 fiscal year in five areas of Japan. For the selection experiments, we used data sets from two areas: one data set giving the lowest baseline speech translation performance for its test set, and another data set giving the highest. In the experiments, we compare two methods for selecting data to be manually translated from the field data. Both of them use source side language models for data selection, but in different manners. According to the experimental results, either or both of the methods show larger improvements compared to a random data selection.

pdf bib
Dialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT
Michael Paul | Andrew Finch | Paul R. Dixon | Eiichiro Sumita
Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties

pdf bib
Integrating Models Derived from non-Parametric Bayesian Co-segmentation into a Statistical Machine Transliteration System
Andrew Finch | Paul Dixon | Eiichiro Sumita
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

pdf bib
Using Features from a Bilingual Alignment Model in Transliteration Mining
Takaaki Fukunishi | Andrew Finch | Seiichi Yamamoto | Eiichiro Sumita
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

pdf bib
An Unsupervised Model for Joint Phrase Alignment and Extraction
Graham Neubig | Taro Watanabe | Eiichiro Sumita | Shinsuke Mori | Tatsuya Kawahara
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Machine Translation System Combination by Confusion Forest
Taro Watanabe | Eiichiro Sumita
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Reordering Constraint Based on Document-Level Context
Takashi Onishi | Masao Utiyama | Eiichiro Sumita
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Searching Translation Memories for Paraphrases
Masao Utiyama | Graham Neubig | Takashi Onishi | Eiichiro Sumita
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
A Comparison of Unsupervised Bilingual Term Extraction Methods Using Phrase-Tables
Masamichi Ideue | Kazuhide Yamamoto | Masao Utiyama | Eiichiro Sumita
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
A Comparison Study of Parsers for Patent Machine Translation
Isao Goto | Masao Utiyama | Takashi Onishi | Eiichiro Sumita
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Rule-based Reordering Constraints for Phrase-based SMT
Chooi-Ling Goh | Takashi Onishi | Eiichiro Sumita
Proceedings of the 15th Annual Conference of the European Association for Machine Translation

pdf bib
Translation Quality Indicators for Pivot-based Statistical MT
Michael Paul | Eiichiro Sumita
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Integration of Multiple Bilingually-Learned Segmentation Schemes into Statistical Machine Translation
Michael Paul | Andrew Finch | Eiichiro Sumita
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Transliteration Using a Phrase-Based Statistical Machine Translation System to Re-Score the Output of a Joint Multigram Model
Andrew Finch | Eiichiro Sumita
Proceedings of the 2010 Named Entities Workshop

pdf bib
Helping Volunteer Translators, Fostering Language Resources
Masao Utiyama | Takeshi Abekawa | Eiichiro Sumita | Kyo Kageura
Proceedings of the 2nd Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources

pdf bib
Syntactic Constraints on Phrase Extraction for Phrase-Based Machine Translation
Hailong Cao | Andrew Finch | Eiichiro Sumita
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation

pdf bib
Paraphrase Lattice for Statistical Machine Translation
Takashi Onishi | Masao Utiyama | Eiichiro Sumita
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
Filtering Syntactic Constraints for Statistical Machine Translation
Hailong Cao | Eiichiro Sumita
Proceedings of the ACL 2010 Conference Short Papers

pdf bib abs
The NICT translation system for IWSLT 2010
Chooi-Ling Goh | Taro Watanabe | Michael Paul | Andrew Finch | Eiichiro Sumita
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes NICT’s participation in the IWSLT 2010 evaluation campaign for the DIALOG translation (Chinese-English) and the BTEC (French-English) translation shared-tasks. For the DIALOG translation, the main challenge to this task is applying context information during translation. Context information can be used to decide on word choice and also to replace missing information during translation. We applied discriminative reranking using contextual information as additional features. In order to provide more choices for re-ranking, we generated n-best lists from multiple phrase-based statistical machine translation systems that varied in the type of Chinese word segmentation schemes used. We also built a model that merged the phrase tables generated by the different segmentation schemes. Furthermore, we used a lattice-based system combination model to combine the output from different systems. A combination of all of these systems was used to produce the n-best lists for re-ranking. For the BTEC task, a general approach that used latticebased system combination of two systems, a standard phrasebased system and a hierarchical phrase-based system, was taken. We also tried to process some unknown words by replacing them with the same words but different inflections that are known to the system.

pdf bib
A Bayesian model of bilingual segmentation for transliteration
Andrew Finch | Eiichiro Sumita
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

pdf bib abs
Community-based Construction of Draft and Final Translation Corpus Through a Translation Hosting Site Minna no Hon’yaku (MNH)
Takeshi Abekawa | Masao Utiyama | Eiichiro Sumita | Kyo Kageura
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we report a way of constructing a translation corpus that contains not only source and target texts, but draft and final versions of target texts, through the translation hosting site Minna no Hon'yaku (MNH). We made MNH publicly available on April 2009. Since then, more than 1,000 users have registered and over 3,500 documents have been translated, as of February 2010, from English to Japanese and from Japanese to English. MNH provides an integrated translation-aid environment, QRedit, which enables translators to look up high-quality dictionaries and Wikipedia as well as to search Google seamlessly. As MNH keeps translation logs, a corpus consisting of source texts, draft translations in several versions, and final translations is constructed naturally through MNH. As of 7 February, 764 documents with multiple translation versions are accumulated, of which 110 are edited by more than one translators. This corpus can be used for self-learning by inexperienced translators on MNH, and potentially for improving machine translation.

2009

pdf bib
On the Importance of Pivot Language Selection for Statistical Machine Translation
Michael Paul | Hirofumi Yamamoto | Eiichiro Sumita | Satoshi Nakamura
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Mining Parallel Texts from Mixed-Language Web Pages
Masao Utiyama | Daisuke Kawahara | Keiji Yasuda | Eiichiro Sumita
Proceedings of Machine Translation Summit XII: Papers

pdf bib
Development of a Japanese-English Software Manual Parallel Corpus
Tatsuya Ishisaka | Masao Utiyama | Eiichiro Sumita | Kazuhide Yamamoto
Proceedings of Machine Translation Summit XII: Posters

pdf bib
Hosting Volunteer Translators
Masao Utiyama | Takeshi Abekawa | Eiichiro Sumita | Kyo Kageura
Proceedings of Machine Translation Summit XII: Posters

pdf bib
NICT@WMT09: Model Adaptation and Transliteration for Spanish-English SMT
Michael Paul | Andrew Finch | Eiichiro Sumita
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation
Kei Hashimoto | Hirohumi Yamamoto | Hideo Okuma | Eiichiro Sumita | Keiichi Tokuda
Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009

pdf bib
Transliteration by Bidirectional Statistical Machine Translation
Andrew Finch | Eiichiro Sumita
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib abs
Two methods for stabilizing MERT
Masao Utiyama | Hirofumi Yamamoto | Eiichiro Sumita
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the NICT SMT system used in the International Workshop on Spoken Language Translation (IWSLT) 2009 evaluation campaign. We participated in the Challenge Task. Our system was based on a fairly common phrase-based machine translation system. We used two methods for stabilizing MERT.

This demo shows the network-based speech-to-speech translation system. The system was designed to perform realtime, location-free, multi-party translation between speakers of different languages. The spoken language modules: automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS), are connected through Web servers that can be accessed via client applications worldwide. In this demo, we will show the multiparty speech-to-speech translation of Japanese, Chinese, Indonesian, Vietnamese, and English, provided by the NICT server. These speech-to-speech modules have been developed by NICT as a part of A-STAR (Asian Speech Translation Advanced Research) consortium project1.

pdf bib
Minna no Hon’yaku: a website for hosting, archiving, and promoting translations
Masao Utiyama | Takeshi Abekawa | Eiichiro Sumita | Kyo Kageura
Proceedings of Translating and the Computer 31

pdf bib
Bidirectional Phrase-based Statistical Machine Translation
Andrew Finch | Eiichiro Sumita
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

This paper describes the National Institute of Information and Communications Technology/Advanced Telecommunications Research Institute International (NICT/ATR) statistical machine translation (SMT) system used for the IWSLT 2008 evaluation campaign. We participated in the Chinese–English (Challenge Task), English–Chinese (Challenge Task), Chinese–English (BTEC Task), Chinese–Spanish (BTEC Task), and Chinese–English–Spanish (PIVOT Task) translation tasks. In the English–Chinese translation Challenge Task, we focused on exploring various factors for the English–Chinese translation because the research on the translation of English–Chinese is scarce compared to the opposite direction. In the Chinese–English translation Challenge Task, we employed a novel clustering method, where training sentences similar to the development data in terms of the word error rate formed a cluster. In the pivot translation task, we integrated two strategies for pivot translation by linear interpolation.

pdf bib
Dynamic Model Interpolation for Statistical Machine Translation
Andrew Finch | Eiichiro Sumita
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Improved Statistical Machine Translation by Multiple Chinese Word Segmentation
Ruiqiang Zhang | Keiji Yasuda | Eiichiro Sumita
Proceedings of the Third Workshop on Statistical Machine Translation

pdf bib
Imposing Constraints from the Source Tree on ITG Constraints for SMT
Hirofumi Yamamoto | Hideo Okuma | Eiichiro Sumita
Proceedings of the ACL-08: HLT Second Workshop on Syntax and Structure in Statistical Translation (SSST-2)

pdf bib
Chinese Unknown Word Translation by Subword Re-segmentation
Ruiqiang Zhang | Eiichiro Sumita
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Method of Selecting Training Data to Build a Compact and Efficient Translation Model
Keiji Yasuda | Ruiqiang Zhang | Hirofumi Yamamoto | Eiichiro Sumita
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib
Achilles: NiCT/ATR Chinese Morphological Analyzer for the Fourth Sighan Bakeoff
Ruiqiang Zhang | Eiichiro Sumita
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

pdf bib
Phrase-based Machine Transliteration
Andrew Finch | Eiichiro Sumita
Proceedings of the Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST)

2007

pdf bib
Multilingual Spoken Language Corpus Development for Communication Research
Toshiyuki Takezawa | Genichiro Kikui | Masahide Mizushima | Eiichiro Sumita
International Journal of Computational Linguistics & Chinese Language Processing, Volume 12, Number 3, September 2007: Special Issue on Invited Papers from ISCSLP 2006

pdf bib
Reducing human assessment of machine translation quality to binary classifiers
Michael Paul | Andrew Finch | Eiichiro Sumita
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

pdf bib
Bilingual Cluster Based Models for Statistical Machine Translation
Hirofumi Yamamoto | Eiichiro Sumita
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Introducing translation dictionary into phrase-based SMT
Hideo Okuma | Hirofumi Yamamoto | Eiichiro Sumita
Proceedings of Machine Translation Summit XI: Papers

pdf bib
Method of selecting training sets to build compact and efficient language model
Keiji Yasuda | Hirofumi Yamamoto | Eiichiro Sumita
Proceedings of the Workshop on Using corpora for natural language generation

This paper describes the NiCT-ATR statistical machine translation (SMT) system used for the IWSLT 2007 evaluation campaign. We participated in three of the four language pair translation tasks (CE, JE, and IE). We used a phrase-based SMT system using log-linear feature models for all tracks. This year we decoded from the ASR n-best lists in the JE track and found a gain in performance. We also applied some new techniques to facilitate the use of out-of-domain external resources by model combination and also by utilizing a huge corpus of n-grams provided by Google Inc.. Using these resources gave mixed results that depended on the technique also the language pair however, in some cases we achieved consistently positive results. The results from model-interpolation in particular were very promising.

pdf bib
NICT-ATR Speech-to-Speech Translation System
Eiichiro Sumita | Tohru Shimizu | Satoshi Nakamura
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation
Ruiqiang Zhang | Eiichiro Sumita
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf bib
Using Lexical Dependency and Ontological Knowledge to Improve a Detailed Syntactic and Semantic Tagger of English
Andrew Finch | Ezra Black | Young-Sook Hwang | Eiichiro Sumita
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Subword-Based Tagging for Confidence-Dependent Chinese Word Segmentation
Ruiqiang Zhang | Genichiro Kikui | Eiichiro Sumita
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
Development of client-server speech translation system on a multi-lingual speech communication platform
Tohru Shimizu | Yutaka Ashikari | Eiichiro Sumita | Hideki Kashioka | Satoshi Nakamura
Proceedings of the Third International Workshop on Spoken Language Translation: Papers

pdf bib
Exploiting Variant Corpora for Machine Translation
Michael Paul | Eiichiro Sumita
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Using the Web to Disambiguate Acronyms
Eiichiro Sumita | Fumiaki Sugaya
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Word Pronunciation Disambiguation using the Web
Eiichiro Sumita | Fumiaki Sugaya
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Subword-based Tagging by Conditional Random Fields for Chinese Word Segmentation
Ruiqiang Zhang | Genichiro Kikui | Eiichiro Sumita
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

2005

pdf bib
Acquiring Synonyms from Monolingual Comparable Texts
Mitsuo Shimohata | Eiichiro Sumita
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Using Machine Translation Evaluation Techniques to Determine Sentence-level Semantic Equivalence
Andrew Finch | Young-Sook Hwang | Eiichiro Sumita
Proceedings of the Third International Workshop on Paraphrasing (IWP2005)

pdf bib abs
Practical Approach to Syntax-based Statistical Machine Translation
Kenji Imamura | Hideo Okuma | Eiichiro Sumita
Proceedings of Machine Translation Summit X: Papers

This paper presents a practical approach to statistical machine translation (SMT) based on syntactic transfer. Conventionally, phrase-based SMT generates an output sentence by combining phrase (multiword sequence) translation and phrase reordering without syntax. On the other hand, SMT based on tree-to-tree mapping, which involves syntactic information, is theoretical, so its features remain unclear from the viewpoint of a practical system. The SMT proposed in this paper translates phrases with hierarchical reordering based on the bilingual parse tree. In our experiments, the best translation was obtained when both phrases and syntactic information were used for the translation process.

pdf bib
Graph-based Retrieval for Example-based Machine Translation Using Edit-distance
Takao Doi | Hirofumi Yamamoto | Eiichiro Sumita
Workshop on example-based machine translation

pdf bib abs
A Machine Learning Approach to Hypotheses Selection of Greedy Decoding for SMT
Michael Paul | Eiichiro Sumita | Seiichi Yamamoto
Workshop on example-based machine translation

This paper proposes a method for integrating example-based and rule-based machine translation systems with statistical methods. It extends a greedy decoder for statistical machine translation (SMT), which searches for an optimal translation by using SMT models starting from a decoder seed, i.e., the source language input paired with an initial translation hypothesis. In order to reduce local optima problems inherent in the search, the outputs generated by multiple translation engines, such as rule-based (RBMT) and example-based (EBMT) systems, are utilized as the initial translation hypotheses. This method outperforms conventional greedy decoding approaches using initial translation hypotheses based on translation examples retrieved from a parallel text corpus. However, the decoding of multiple initial translation hypotheses is computationally expensive. This paper proposes a method to select a single initial translation hypothesis before decoding based on a machine learning approach that judges the appropriateness of multiple initial translation hypotheses and selects the most confident one for decoding. Our approach is evaluated for the translation of dialogues in the travel domain, and the results show that it drastically reduces computational costs without a loss in translation quality.

pdf bib
Measuring Non-native Speakers’ Proficiency of English by Using a Test with Automatically-Generated Fill-in-the-Blank Questions
Eiichiro Sumita | Fumiaki Sugaya | Seiichi Yamamoto
Proceedings of the Second Workshop on Building Educational Applications Using NLP

2004

pdf bib
Example-based Machine Translation Based on Syntactic Transfer with Statistical Models
Kenji Imamura | Hideo Okuma | Taro Watanabe | Eiichiro Sumita
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Splitting Input Sentence for Machine Translation Using Language Model with Sentence Similarity
Takao Doi | Eiichiro Sumita
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Reordering Constraints for Phrase-Based Statistical Machine Translation
Richard Zens | Hermann Ney | Taro Watanabe | Eiichiro Sumita
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Using a Mixture of N-Best Lists from Multiple MT Systems in Rank-Sum-Based Confidence Measure for MT Outputs
Yasuhiro Akiba | Eiichiro Sumita | Hiromi Nakaiwa | Seiichi Yamamoto | Hiroshi G. Okuno
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Building a Paraphrase Corpus for Speech Translation
Mitsuo Shimohata | Eiichiro Sumita | Yuji Matsumoto
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Incremental Methods to Select Test Sentences for Evaluating Translation Ability
Yasuhiro Akiba | Eiichiro Sumita | Hiromi Nakaiwa | Seiichi Yamamoto | Hiroshi G. Okuno
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
How Does Automatic Machine Translation Evaluation Correlate with Human Scoring as the Number of Reference Translations Increases?
Andrew Finch | Yasuhiro Akiba | Eiichiro Sumita
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Automatic Measuring of English Language Proficiency using MT Evaluation Technology
Keiji Yasuda | Fumiaki Sugaya | Eiichiro Sumita | Toshiyuki Takezawa | Genichiro Kikui | Seiichi Yamamoto
Proceedings of the Workshop on eLearning for Computational Linguistics and Computational Linguistics for eLearning

pdf bib
Example-based Rescoring of Statistical Machine Translation Output
Michael Paul | Eiichiro Sumita | Seiichi Yamamoto
Proceedings of HLT-NAACL 2004: Short Papers

pdf bib
Method for retrieving a similar sentence and its application to machine translation
Mitsuo Shimohata | Eiichiro Sumita | Yuji Matsumoto
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

2003

pdf bib
Retrieving Meaning-equivalent Sentences for Example-based Rough Translation
Mitsuo Shimohata | Eiichiro Sumita | Yuji Matsumoto
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
Input Sentence Splitting and Translating
Takao Doi | Eiichiro Sumita
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
Automatic Construction of Machine Translation Knowledge Using Translation Literalness
Kenji Imamura | Eiichiro Sumita | Yuji Matsumoto
10th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Chunk-Based Statistical Translation
Taro Watanabe | Eiichiro Sumita | Hiroshi G. Okuno
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Feedback Cleaning of Machine Translation Rules Using Automatic Evaluation
Kenji Imamura | Eiichiro Sumita | Yuji Matsumoto
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Adaptation Using Out-of-Domain Corpus within EBMT
Takao Doi | Eiichiro Sumita | Hirofumi Yamamoto
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

pdf bib
Automatic Expansion of Equivalent Sentence Set Based on Syntactic Substitution
Kenji Imamura | Yasuhiro Akiba | Eiichiro Sumita
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

pdf bib abs
Experimental comparison of MT evaluation methods: RED vs.BLEU
Yasuhiro Akiba | Eiichiro Sumita | Hiromi Nakaiwa | Seiichi Yamamoto | Hiroshi G. Okuno
Proceedings of Machine Translation Summit IX: Papers

This paper experimentally compares two automatic evaluators, RED and BLEU, to determine how close the evaluation results of each automatic evaluator are to average evaluation results by human evaluators, following the ATR standard of MT evaluation. This paper gives several cautionary remarks intended to prevent MT developers from drawing misleading conclusions when using the automatic evaluators. In addition, this paper reports a way of using the automatic evaluators so that their results agree with those of human evaluators.

pdf bib abs
Example-based rough translation for speech-to-speech translation
Mitsuo Shimohata | Eiichiro Sumita | Yuji Matsumoto
Proceedings of Machine Translation Summit IX: Papers

Example-based machine translation (EBMT) is a promising translation method for speech-to-speech translation (S2ST) because of its robustness. However, it has two problems in that the performance degrades when input sentences are long and when the style of the input sentences and that of the example corpus are different. This paper proposes example-based rough translation to overcome these two problems. The rough translation method relies on “meaning-equivalent sentences,” which share the main meaning with an input sentence despite missing some unimportant information. This method facilitates retrieval of meaning-equivalent sentences for long input sentences. The retrieval of meaning-equivalent sentences is based on content words, modality, and tense. This method also provides robustness against the style differences between the input sentence and the example corpus.

pdf bib abs
Example-based decoding for statistical machine translation
Taro Watanabe | Eiichiro Sumita
Proceedings of Machine Translation Summit IX: Papers

This paper presents a decoder for statistical machine translation that can take advantage of the example-based machine translation framework. The decoder presented here is based on the greedy approach to the decoding problem, but the search is initiated from a similar translation extracted from a bilingual corpus. The experiments on multilingual translations showed that the proposed method was far superior to a word-by-word generation beam search algorithm.

2002

pdf bib
Statistical machine translation based on hierarchical phrase alignment
Taro Watanabe | Kenji Imamura | Eiichiro Sumita
Proceedings of the 9th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

bib
Example-based machine translation
Eiichiro Sumita | Kenji Imamura
Proceedings of the 9th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Tutorials

pdf bib
Automatic paraphrasing based on parallel corpus for normalization
Mitsuo Shimohata | Eiichiro Sumita
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Statistical Machine Translation on Paraphrased Corpora
Taro Watanabe | Mitsuo Shimohata | Eiichiro Sumita
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Toward a Broad-coverage Bilingual Corpus for Speech Translation of Travel Conversations in the Real World
Toshiyuki Takezawa | Eiichiro Sumita | Fumiaki Sugaya | Hirofumi Yamamoto | Seiichi Yamamoto
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Corpus-based Generation of Numeral Classifier using Phrase Alignment
Michael Paul | Eiichiro Sumita | Seiichi Yamamoto
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Bidirectional Decoding for Statistical Machine Translation
Taro Watanabe | Eiichiro Sumita
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Using Language and Translation Models to Select the Best among Outputs from Multiple MT Systems
Yasuhiro Akiba | Taro Watanabe | Eiichiro Sumita
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Corpus-Centered Computation
Eiichiro Sumita
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

pdf bib
Identifying Synonymous Expressions from a Bilingual Corpus for Example-Based Machine Translation
Mitsuo Shimohata | Eiichiro Sumita
COLING-02: Machine Translation in Asia

2001

pdf bib
Example-based machine translation using DP-matching between work sequences
Eiichiro Sumita
Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation

pdf bib
Integration of Referential Scope Limitations into Japanese Pronoun Resolution
Michael Paul | Eiichiro Sumita
Proceedings of the Second SIGdial Workshop on Discourse and Dialogue

pdf bib abs
Using multiple edit distances to automatically rank machine translation output
Yasuhiro Akiba | Kenji Imamura | Eiichiro Sumita
Proceedings of Machine Translation Summit VIII

This paper addresses the challenging problem of automatically evaluating output from machine translation (MT) systems in order to support the developers of these systems. Conventional approaches to the problem include methods that automatically assign a rank such as A, B, C, or D to MT output according to a single edit distance between this output and a correct translation example. The single edit distance can be differently designed, but changing its design makes assigning a certain rank more accurate, but another rank less accurate. This inhibits improving accuracy of rank assignment. To overcome this obstacle, this paper proposes an automatic ranking method that, by using multiple edit distances, encodes machine-translated sentences with a rank assigned by humans into multi-dimensional vectors from which a classifier of ranks is learned in the form of a decision tree (DT). The proposed method assigns a rank to MT output through the learned DT. The proposed method is evaluated using transcribed texts of real conversations in the travel arrangement domain. Experimental results show that the proposed method is more accurate than the single-edit-distance-based ranking methods, in both closed and open tests. Moreover, the proposed method could estimate MT quality within 3% error in some cases.

2000

pdf bib
Lexical Transfer Using a Vector-Space Model
Eiichiro Sumita
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf bib
Translation using Information on Dialogue Participants
Setsuo Yamada | Eiichiro Sumita | Hideki Kashioka
Sixth Applied Natural Language Processing Conference

1999

pdf bib
Corpus-Based Anaphora Resolution Towards Antecedent Preference
Michael Paul | Kazuhide Yamamoto | Eiichiro Sumita
Coreference and Its Applications

ATR has built a multi-language speech translation system called ATR-MATRIX. It consists of a spoken-language translation subsystem, which is the focus of this paper, together with a highly accurate speech recognition subsystem and a high-definition speech synthesis subsystem. This paper gives a road map of solutions to the problems inherent in spoken-language translation. Spoken-language translation systems need to tackle difficult problems such as ungrammaticality. contextual phenomena, speech recognition errors, and the high-speeds required for real-time use. We have made great strides towards solving these problems in recent years. Our approach mainly uses an example-based translation model called TDMT. We have added the use of extra-linguistic information, a decision tree learning mechanism, and methods dealing with recognition errors.