ThuyLinh Nguyen

Also published as: Thuy Linh Nguyen


2013

pdf bib
Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation
ThuyLinh Nguyen | Stephan Vogel
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Nonparametric Model for Inupiaq Word Segmentation
Thuy Linh Nguyen | Stephan Vogel
Proceedings of COLING 2012: Demonstration Papers

2010

pdf bib
Nonparametric Word Segmentation for Machine Translation
ThuyLinh Nguyen | Stephan Vogel | Noah A. Smith
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2008

pdf bib
Context-based Arabic Morphological Analysis for Machine Translation
ThuyLinh Nguyen | Stephan Vogel
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

pdf bib
Diacritization as a Machine Translation and as a Sequence Labeling Problem
Tim Schlippe | ThuyLinh Nguyen | Stephan Vogel
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Student Research Workshop

In this paper we describe and compare two techniques for the automatic diacritization of Arabic text: First, we treat diacritization as a monotone machine translation problem, proposing and evaluating several translation and language models, including word and character-based models separately and combined as well as a model which uses statistical machine translation (SMT) to post-edit a rule-based diacritization system. Then we explore a more traditional view of diacritization as a sequence labeling problem, and propose a solution using conditional random fields (Lafferty et al., 2001). All these techniques are compared through word error rate and diacritization error rate both in terms of full diacritization and ignoring vowel endings. The empirical experiments showed that the machine translation approaches perform better than the sequence labeling approaches concerning the error rates.

2007

pdf bib
The CMU TransTac 2007 eyes-free two-way speech-to-speech translation system
Nguyen Bach | Matthais Eck | Paisarn Charoenpornsawat | Thilo Köhler | Sebastian Stüker | ThuyLinh Nguyen | Roger Hsiao | Alex Waibel | Stephan Vogel | Tanja Schultz | Alan W. Black
Proceedings of the Fourth International Workshop on Spoken Language Translation

The paper describes our portable two-way speech-to-speech translation system using a completely eyes-free/hands-free user interface. This system translates between the language pair English and Iraqi Arabic as well as between English and Farsi, and was built within the framework of the DARPA TransTac program. The Farsi language support was developed within a 90-day period, testing our ability to rapidly support new languages. The paper gives an overview of the system’s components along with the individual component objective measures and a discussion of issues relevant for the overall usage of the system. We found that usability, flexibility, and robustness serve as severe constraints on system architecture and design.

pdf bib
The CMU-UKA statistical machine translation systems for IWSLT 2007
Ian Lane | Andreas Zollmann | Thuy Linh Nguyen | Nguyen Bach | Ashish Venugopal | Stephan Vogel | Kay Rottmann | Ying Zhang | Alex Waibel
Proceedings of the Fourth International Workshop on Spoken Language Translation

This paper describes the CMU-UKA statistical machine translation systems submitted to the IWSLT 2007 evaluation campaign. Systems were submitted for three language-pairs: Japanese→English, Chinese→English and Arabic→English. All systems were based on a common phrase-based SMT (statistical machine translation) framework but for each language-pair a specific research problem was tackled. For Japanese→English we focused on two problems: first, punctuation recovery, and second, how to incorporate topic-knowledge into the translation framework. Our Chinese→English submission focused on syntax-augmented SMT and for the Arabic→English task we focused on incorporating morphological-decomposition into the SMT framework. This research strategy enabled us to evaluate a wide variety of approaches which proved effective for the language pairs they were evaluated on.