Minwei Feng

2023

The transformer model is known to be computationally demanding, and prohibitively costly for long sequences, as the self-attention module uses a quadratic time and space complexity with respect to sequence length. Many researchers have focused on designing new forms of self-attention or introducing new parameters to overcome this limitation, however a large portion of them prohibits the model to inherit weights from large pretrained models. In this work, the transformer’s inefficiency has been taken care of from another perspective. We propose Fourier Transformer, a simple yet effective approach by progressively removing redundancies in hidden sequence using the ready-made Fast Fourier Transform (FFT) operator to perform Discrete Cosine Transformation (DCT). Fourier Transformer is able to significantly reduce computational costs while retain the ability to inherit from various large pretrained models. Experiments show that our model achieves state-of-the-art performances among all transformer-based models on the long-range modeling benchmark LRA with significant improvement in both speed and space. For generative seq-to-seq tasks including CNN/DailyMail and ELI5, by inheriting the BART weights our model outperforms the standard BART and other efficient models. Our code will be publicly available at https://github.com/LUMIA-Group/FourierTransformer

2015

pdf bib
Efficient Hyper-parameter Optimization for NLP Applications
Lidan Wang | Minwei Feng | Bowen Zhou | Bing Xiang | Sridhar Mahadevan
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Local System Voting Feature for Machine Translation System Combination
Markus Freitag | Jan-Thorsten Peter | Stephan Peitz | Minwei Feng | Hermann Ney
Proceedings of the Tenth Workshop on Statistical Machine Translation

2013

This work describes the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign International Workshop on Spoken Language Translation (IWSLT) 2013. We participated in the English→French, English↔German, Arabic→English, Chinese→English and Slovenian↔English MT tracks and the English→French and English→German SLT tracks. We apply phrase-based and hierarchical SMT decoders, which are augmented by state-of-the-art extensions. The novel techniques we experimentally evaluate include discriminative phrase training, a continuous space language model, a hierarchical reordering model, a word class language model, domain adaptation via data selection and system combination of standard and reverse order models. By application of these methods we can show considerable improvements over the respective baseline systems.

pdf bib
Reverse Word Order Model
Markus Freitag | Minwei Feng | Matthias Huck | Stephan Peitz | Hermann Ney
Proceedings of Machine Translation Summit XIV: Papers

pdf bib
Advancements in Reordering Models for Statistical Machine Translation
Minwei Feng | Jan-Thorsten Peter | Hermann Ney
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Semantic Cohesion Model for Phrase-Based SMT
Minwei Feng | Weiwei Sun | Hermann Ney
Proceedings of COLING 2012

In this paper, the automatic speech recognition (ASR) and statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2012 are presented. We participated in the ASR (English), MT (English-French, Arabic-English, Chinese-English, German-English) and SLT (English-French) tracks. For the MT track both hierarchical and phrase-based SMT decoders are applied. A number of different techniques are evaluated in the MT and SLT tracks, including domain adaptation via data selection, translation model interpolation, phrase training for hierarchical and phrase-based systems, additional reordering model, word class language model, various Arabic and Chinese segmentation methods, postprocessing of speech recognition output with an SMT system, and system combination. By application of these methods we can show considerable improvements over the respective baseline systems.

pdf bib abs
Sequence labeling-based reordering model for phrase-based SMT
Minwei Feng | Jan-Thorsten Peter | Hermann Ney
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers

For current statistical machine translation system, reordering is still a major problem for language pairs like Chinese-English, where the source and target language have significant word order differences. In this paper, we propose a novel reordering model based on sequence labeling techniques. Our model converts the reordering problem into a sequence labeling problem, i.e. a tagging task. For the given source sentence, we assign each source token a label which contains the reordering information for that token. We also design an unaligned word tag so that the unaligned word phenomenon is automatically implanted in the proposed model. Our reordering model is conditioned on the whole source sentence. Hence it is able to catch the long dependency in the source sentence. Although the learning on large scale task requests notably amounts of computational resources, the decoder makes use of the tagging information as soft constraints. Therefore, the training procedure of our model is computationally expensive for large task while in the test phase (during translation) our model is very efficient. We carried out experiments on five Chinese-English NIST tasks trained with BOLT data. Results show that our model improves the baseline system by 1.32 BLEU 1.53 TER on average.

pdf bib
A Tagging-style Reordering Model for Phrase-based SMT
Minwei Feng | Hermann Ney
Proceedings of the Workshop on Reordering for Statistical Machine Translation

2011

In this paper the statistical machine translation (SMT) systems of RWTH Aachen University developed for the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2011 is presented. We participated in the MT (English-French, Arabic-English, ChineseEnglish) and SLT (English-French) tracks. Both hierarchical and phrase-based SMT decoders are applied. A number of different techniques are evaluated, including domain adaptation via monolingual and bilingual data selection, phrase training, different lexical smoothing methods, additional reordering models for the hierarchical system, various Arabic and Chinese segmentation methods, punctuation prediction for speech recognition output, and system combination. By application of these methods we can show considerable improvements over the respective baseline systems.

2010

pdf bib abs
A Source-side Decoding Sequence Model for Statistical Machine Translation
Minwei Feng | Arne Mauser | Hermann Ney
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

We propose a source-side decoding sequence language model for phrase-based statistical machine translation. This model is a reordering model in the sense that it helps the decoder find the correct decoding sequence. The model uses word-aligned bilingual training data. We show improved translation quality of up to 1.34% BLEU and 0.54% TER using this model compared to three other widely used reordering models.