Tamer Alkhouli


2020

pdf bib
Start-Before-End and End-to-End: Neural Speech Translation by AppTek and RWTH Aachen University
Parnia Bahar | Patrick Wilken | Tamer Alkhouli | Andreas Guta | Pavel Golik | Evgeny Matusov | Christian Herold
Proceedings of the 17th International Conference on Spoken Language Translation

AppTek and RWTH Aachen University team together to participate in the offline and simultaneous speech translation tracks of IWSLT 2020. For the offline task, we create both cascaded and end-to-end speech translation systems, paying attention to careful data selection and weighting. In the cascaded approach, we combine high-quality hybrid automatic speech recognition (ASR) with the Transformer-based neural machine translation (NMT). Our end-to-end direct speech translation systems benefit from pretraining of adapted encoder and decoder components, as well as synthetic data and fine-tuning and thus are able to compete with cascaded systems in terms of MT quality. For simultaneous translation, we utilize a novel architecture that makes dynamic decisions, learned from parallel data, to determine when to continue feeding on input or generate output words. Experiments with speech and text input show that even at low latency this architecture leads to superior translation results.

pdf bib
Neural Simultaneous Speech Translation Using Alignment-Based Chunking
Patrick Wilken | Tamer Alkhouli | Evgeny Matusov | Pavel Golik
Proceedings of the 17th International Conference on Spoken Language Translation

In simultaneous machine translation, the objective is to determine when to produce a partial translation given a continuous stream of source words, with a trade-off between latency and quality. We propose a neural machine translation (NMT) model that makes dynamic decisions when to continue feeding on input or generate output words. The model is composed of two main components: one to dynamically decide on ending a source chunk, and another that translates the consumed chunk. We train the components jointly and in a manner consistent with the inference conditions. To generate chunked training data, we propose a method that utilizes word alignment while also preserving enough context. We compare models with bidirectional and unidirectional encoders of different depths, both on real speech and text input. Our results on the IWSLT 2020 English-to-German task outperform a wait-k baseline by 2.6 to 3.7% BLEU absolute.

2018

pdf bib
On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation
Tamer Alkhouli | Gabriel Bretschner | Hermann Ney
Proceedings of the Third Conference on Machine Translation: Research Papers

This work investigates the alignment problem in state-of-the-art multi-head attention models based on the transformer architecture. We demonstrate that alignment extraction in transformer models can be improved by augmenting an additional alignment head to the multi-head source-to-target attention component. This is used to compute sharper attention weights. We describe how to use the alignment head to achieve competitive performance. To study the effect of adding the alignment head, we simulate a dictionary-guided translation task, where the user wants to guide translation using pre-defined dictionary entries. Using the proposed approach, we achieve up to 3.8% BLEU improvement when using the dictionary, in comparison to 2.4% BLEU in the baseline case. We also propose alignment pruning to speed up decoding in alignment-based neural machine translation (ANMT), which speeds up translation by a factor of 1.8 without loss in translation performance. We carry out experiments on the shared WMT 2016 English→Romanian news task and the BOLT Chinese→English discussion forum task.

pdf bib
Neural Hidden Markov Model for Machine Translation
Weiyue Wang | Derui Zhu | Tamer Alkhouli | Zixuan Gan | Hermann Ney
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Attention-based neural machine translation (NMT) models selectively focus on specific source positions to produce a translation, which brings significant improvements over pure encoder-decoder sequence-to-sequence models. This work investigates NMT while replacing the attention component. We study a neural hidden Markov model (HMM) consisting of neural network-based alignment and lexicon models, which are trained jointly using the forward-backward algorithm. We show that the attention component can be effectively replaced by the neural network alignment model and the neural HMM approach is able to provide comparable performance with the state-of-the-art attention-based models on the WMT 2017 German↔English and Chinese→English translation tasks.

pdf bib
RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition
Albert Zeyer | Tamer Alkhouli | Hermann Ney
Proceedings of ACL 2018, System Demonstrations

We compare the fast training and decoding speed of RETURNN of attention models for translation, due to fast CUDA LSTM kernels, and a fast pure TensorFlow beam search decoder. We show that a layer-wise pretraining scheme for recurrent attention models gives over 1% BLEU improvement absolute and it allows to train deeper recurrent encoder networks. Promising preliminary results on max. expected BLEU training are presented. We are able to train state-of-the-art models for translation and end-to-end models for speech recognition and show results on WMT 2017 and Switchboard. The flexibility of RETURNN allows a fast research feedback loop to experiment with alternative architectures, and its generality allows to use it on a wide range of applications.

2017

pdf bib
Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information
Tamer Alkhouli | Hermann Ney
Proceedings of the Second Conference on Machine Translation

pdf bib
The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017
Jan-Thorsten Peter | Andreas Guta | Tamer Alkhouli | Parnia Bahar | Jan Rosendahl | Nick Rossenbach | Miguel Graça | Hermann Ney
Proceedings of the Second Conference on Machine Translation

pdf bib
Hybrid Neural Network Alignment and Lexicon Model in Direct HMM for Statistical Machine Translation
Weiyue Wang | Tamer Alkhouli | Derui Zhu | Hermann Ney
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Recently, the neural machine translation systems showed their promising performance and surpassed the phrase-based systems for most translation tasks. Retreating into conventional concepts machine translation while utilizing effective neural models is vital for comprehending the leap accomplished by neural machine translation over phrase-based methods. This work proposes a direct HMM with neural network-based lexicon and alignment models, which are trained jointly using the Baum-Welch algorithm. The direct HMM is applied to rerank the n-best list created by a state-of-the-art phrase-based translation system and it provides improvements by up to 1.0% Bleu scores on two different translation tasks.

2016

pdf bib
Alignment-Based Neural Machine Translation
Tamer Alkhouli | Gabriel Bretschner | Jan-Thorsten Peter | Mohammed Hethnawi | Andreas Guta | Hermann Ney
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

pdf bib
The QT21/HimL Combined Machine Translation System
Jan-Thorsten Peter | Tamer Alkhouli | Hermann Ney | Matthias Huck | Fabienne Braune | Alexander Fraser | Aleš Tamchyna | Ondřej Bojar | Barry Haddow | Rico Sennrich | Frédéric Blain | Lucia Specia | Jan Niehues | Alex Waibel | Alexandre Allauzen | Lauriane Aufrant | Franck Burlot | Elena Knyazeva | Thomas Lavergne | François Yvon | Mārcis Pinnis | Stella Frank
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
The RWTH Aachen University English-Romanian Machine Translation System for WMT 2016
Jan-Thorsten Peter | Tamer Alkhouli | Andreas Guta | Hermann Ney
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

2015

pdf bib
A Comparison between Count and Neural Network Models Based on Joint Translation and Reordering Sequences
Andreas Guta | Tamer Alkhouli | Jan-Thorsten Peter | Joern Wuebker | Hermann Ney
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Investigations on Phrase-based Decoding with Recurrent Neural Network Language and Translation Models
Tamer Alkhouli | Felix Rietig | Hermann Ney
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

pdf bib
Vector Space Models for Phrase-based Machine Translation
Tamer Alkhouli | Andreas Guta | Hermann Ney
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Translation Modeling with Bidirectional Recurrent Neural Networks
Martin Sundermeyer | Tamer Alkhouli | Joern Wuebker | Hermann Ney
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
The RWTH Aachen machine translation systems for IWSLT 2013
Joern Wuebker | Stephan Peitz | Tamer Alkhouli | Jan-Thorsten Peter | Minwei Feng | Markus Freitag | Hermann Ney
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

This work describes the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign International Workshop on Spoken Language Translation (IWSLT) 2013. We participated in the English→French, English↔German, Arabic→English, Chinese→English and Slovenian↔English MT tracks and the English→French and English→German SLT tracks. We apply phrase-based and hierarchical SMT decoders, which are augmented by state-of-the-art extensions. The novel techniques we experimentally evaluate include discriminative phrase training, a continuous space language model, a hierarchical reordering model, a word class language model, domain adaptation via data selection and system combination of standard and reverse order models. By application of these methods we can show considerable improvements over the respective baseline systems.