Michael Paul

Also published as: Michael J. Paul


2021

pdf bib
User Factor Adaptation for User Embedding via Multitask Learning
Xiaolei Huang | Michael J. Paul | Franck Dernoncourt | Robin Burke | Mark Dredze
Proceedings of the Second Workshop on Domain Adaptation for NLP

Language varies across users and their interested fields in social media data: words authored by a user across his/her interests may have different meanings (e.g., cool) or sentiments (e.g., fast). However, most of the existing methods to train user embeddings ignore the variations across user interests, such as product and movie categories (e.g., drama vs. action). In this study, we treat the user interest as domains and empirically examine how the user language can vary across the user factor in three English social media datasets. We then propose a user embedding model to account for the language variability of user interests via a multitask learning framework. The model learns user language and its variations without human supervision. While existing work mainly evaluated the user embedding by extrinsic tasks, we propose an intrinsic evaluation via clustering and evaluate user embeddings by an extrinsic task, text classification. The experiments on the three English-language social media datasets show that our proposed approach can generally outperform baselines via adapting the user factor.

2020

pdf bib
Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition
Xiaolei Huang | Linzi Xing | Franck Dernoncourt | Michael J. Paul
Proceedings of the Twelfth Language Resources and Evaluation Conference

Existing research on fairness evaluation of document classification models mainly uses synthetic monolingual data without ground truth for author demographic attributes. In this work, we assemble and publish a multilingual Twitter corpus for the task of hate speech detection with inferred four author demographic factors: age, country, gender and race/ethnicity. The corpus covers five languages: English, Italian, Polish, Portuguese and Spanish. We evaluate the inferred demographic labels with a crowdsourcing platform, Figure Eight. To examine factors that can cause biases, we take an empirical analysis of demographic predictability on the English corpus. We measure the performance of four popular document classifiers and evaluate the fairness and bias of the baseline classifiers on the author-level demographic attributes.

pdf bib
Why Overfitting Isn’t Always Bad: Retrofitting Cross-Lingual Word Embeddings to Dictionaries
Mozhi Zhang | Yoshinari Fujinuma | Michael J. Paul | Jordan Boyd-Graber
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Cross-lingual word embeddings (CLWE) are often evaluated on bilingual lexicon induction (BLI). Recent CLWE methods use linear projections, which underfit the training dictionary, to generalize on BLI. However, underfitting can hinder generalization to other downstream tasks that rely on words from the training dictionary. We address this limitation by retrofitting CLWE to the training dictionary, which pulls training translation pairs closer in the embedding space and overfits the training dictionary. This simple post-processing step often improves accuracy on two downstream tasks, despite lowering BLI test accuracy. We also retrofit to both the training dictionary and a synthetic dictionary induced from CLWE, which sometimes generalizes even better on downstream tasks. Our results confirm the importance of fully exploiting training dictionary in downstream tasks and explains why BLI is a flawed CLWE evaluation.

pdf bib
An Empirical Study on Crosslingual Transfer in Probabilistic Topic Models
Shudong Hao | Michael J. Paul
Computational Linguistics, Volume 46, Issue 1 - March 2020

Probabilistic topic modeling is a common first step in crosslingual tasks to enable knowledge transfer and extract multilingual features. Although many multilingual topic models have been developed, their assumptions about the training corpus are quite varied, and it is not clear how well the different models can be utilized under various training conditions. In this article, the knowledge transfer mechanisms behind different multilingual topic models are systematically studied, and through a broad set of experiments with four models on ten languages, we provide empirical insights that can inform the selection and future development of multilingual topic models.

2019

pdf bib
Neural Temporality Adaptation for Document Classification: Diachronic Word Embeddings and Domain Adaptation Models
Xiaolei Huang | Michael J. Paul
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Language usage can change across periods of time, but document classifiers models are usually trained and tested on corpora spanning multiple years without considering temporal variations. This paper describes two complementary ways to adapt classifiers to shifts across time. First, we show that diachronic word embeddings, which were originally developed to study language change, can also improve document classification, and we show a simple method for constructing this type of embedding. Second, we propose a time-driven neural classification model inspired by methods for domain adaptation. Experiments on six corpora show how these methods can make classifiers more robust over time.

pdf bib
A Resource-Free Evaluation Metric for Cross-Lingual Word Embeddings Based on Graph Modularity
Yoshinari Fujinuma | Jordan Boyd-Graber | Michael J. Paul
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Cross-lingual word embeddings encode the meaning of words from different languages into a shared low-dimensional space. An important requirement for many downstream tasks is that word similarity should be independent of language—i.e., word vectors within one language should not be more similar to each other than to words in another language. We measure this characteristic using modularity, a network measurement that measures the strength of clusters in a graph. Modularity has a moderate to strong correlation with three downstream tasks, even though modularity is based only on the structure of embeddings and does not require any external resources. We show through experiments that modularity can serve as an intrinsic validation metric to improve unsupervised cross-lingual word embeddings, particularly on distant language pairs in low-resource settings.

pdf bib
Neural User Factor Adaptation for Text Classification: Learning to Generalize Across Author Demographics
Xiaolei Huang | Michael J. Paul
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Language use varies across different demographic factors, such as gender, age, and geographic location. However, most existing document classification methods ignore demographic variability. In this study, we examine empirically how text data can vary across four demographic factors: gender, age, country, and region. We propose a multitask neural model to account for demographic variations via adversarial training. In experiments on four English-language social media datasets, we find that classification performance improves when adapting for user factors.

pdf bib
Analyzing Bayesian Crosslingual Transfer in Topic Models
Shudong Hao | Michael J. Paul
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

We introduce a theoretical analysis of crosslingual transfer in probabilistic topic models. By formulating posterior inference through Gibbs sampling as a process of language transfer, we propose a new measure that quantifies the loss of knowledge across languages during this process. This measure enables us to derive a PAC-Bayesian bound that elucidates the factors affecting model quality, both during training and in downstream applications. We provide experimental validation of the analysis on a diverse set of five languages, and discuss best practices for data collection and model design based on our analysis.

pdf bib
Evaluating Topic Quality with Posterior Variability
Linzi Xing | Michael J. Paul | Giuseppe Carenini
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Probabilistic topic models such as latent Dirichlet allocation (LDA) are popularly used with Bayesian inference methods such as Gibbs sampling to learn posterior distributions over topic model parameters. We derive a novel measure of LDA topic quality using the variability of the posterior distributions. Compared to several existing baselines for automatic topic evaluation, the proposed metric achieves state-of-the-art correlations with human judgments of topic quality in experiments on three corpora. We additionally demonstrate that topic quality estimation can be further improved using a supervised estimator that combines multiple metrics.

pdf bib
Overview of the Fourth Social Media Mining for Health (SMM4H) Shared Tasks at ACL 2019
Davy Weissenbacher | Abeed Sarker | Arjun Magge | Ashlynn Daughton | Karen O’Connor | Michael J. Paul | Graciela Gonzalez-Hernandez
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

The number of users of social media continues to grow, with nearly half of adults worldwide and two-thirds of all American adults using social networking. Advances in automated data processing, machine learning and NLP present the possibility of utilizing this massive data source for biomedical and public health applications, if researchers address the methodological challenges unique to this media. We present the Social Media Mining for Health Shared Tasks collocated with the ACL at Florence in 2019, which address these challenges for health monitoring and surveillance, utilizing state of the art techniques for processing noisy, real-world, and substantially creative language expressions from social media users. For the fourth execution of this challenge, we proposed four different tasks. Task 1 asked participants to distinguish tweets reporting an adverse drug reaction (ADR) from those that do not. Task 2, a follow-up to Task 1, asked participants to identify the span of text in tweets reporting ADRs. Task 3 is an end-to-end task where the goal was to first detect tweets mentioning an ADR and then map the extracted colloquial mentions of ADRs in the tweets to their corresponding standard concept IDs in the MedDRA vocabulary. Finally, Task 4 asked participants to classify whether a tweet contains a personal mention of one’s health, a more general discussion of the health issue, or is an unrelated mention. A total of 34 teams from around the world registered and 19 teams from 12 countries submitted a system run. We summarize here the corpora for this challenge which are freely available at https://competitions.codalab.org/competitions/22521, and present an overview of the methods and the results of the competing systems.

2018

pdf bib
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task
Graciela Gonzalez-Hernandez | Davy Weissenbacher | Abeed Sarker | Michael Paul
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

pdf bib
Overview of the Third Social Media Mining for Health (SMM4H) Shared Tasks at EMNLP 2018
Davy Weissenbacher | Abeed Sarker | Michael J. Paul | Graciela Gonzalez-Hernandez
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

The goals of the SMM4H shared tasks are to release annotated social media based health related datasets to the research community, and to compare the performances of natural language processing and machine learning systems on tasks involving these datasets. The third execution of the SMM4H shared tasks, co-hosted with EMNLP-2018, comprised of four subtasks. These subtasks involve annotated user posts from Twitter (tweets) and focus on the (i) automatic classification of tweets mentioning a drug name, (ii) automatic classification of tweets containing reports of first-person medication intake, (iii) automatic classification of tweets presenting self-reports of adverse drug reaction (ADR) detection, and (iv) automatic classification of vaccine behavior mentions in tweets. A total of 14 teams participated and 78 system runs were submitted (23 for task 1, 20 for task 2, 18 for task 3, 17 for task 4).

pdf bib
Learning Multilingual Topics from Incomparable Corpora
Shudong Hao | Michael J. Paul
Proceedings of the 27th International Conference on Computational Linguistics

Multilingual topic models enable crosslingual tasks by extracting consistent topics from multilingual corpora. Most models require parallel or comparable training corpora, which limits their ability to generalize. In this paper, we first demystify the knowledge transfer mechanism behind multilingual topic models by defining an alternative but equivalent formulation. Based on this analysis, we then relax the assumption of training data required by most existing models, creating a model that only requires a dictionary for training. Experiments show that our new method effectively learns coherent multilingual topics from partially and fully incomparable corpora with limited amounts of dictionary resources.

pdf bib
Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation
Shudong Hao | Jordan Boyd-Graber | Michael J. Paul
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Multilingual topic models enable document analysis across languages through coherent multilingual summaries of the data. However, there is no standard and effective metric to evaluate the quality of multilingual topics. We introduce a new intrinsic evaluation of multilingual topic models that correlates well with human judgments of multilingual topic coherence as well as performance in downstream applications. Importantly, we also study evaluation for low-resource languages. Because standard metrics fail to accurately measure topic quality when robust external resources are unavailable, we propose an adaptation model that improves the accuracy and reliability of these metrics in low-resource settings.

pdf bib
Examining Temporality in Document Classification
Xiaolei Huang | Michael J. Paul
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Many corpora span broad periods of time. Language processing models trained during one time period may not work well in future time periods, and the best model may depend on specific times of year (e.g., people might describe hotels differently in reviews during the winter versus the summer). This study investigates how document classifiers trained on documents from certain time intervals perform on documents from other time intervals, considering both seasonal intervals (intervals that repeat across years, e.g., winter) and non-seasonal intervals (e.g., specific years). We show experimentally that classification performance varies over time, and that performance can be improved by using a standard domain adaptation approach to adjust for changes in time.

2017

pdf bib
Incorporating Metadata into Content-Based User Embeddings
Linzi Xing | Michael J. Paul
Proceedings of the 3rd Workshop on Noisy User-generated Text

Low-dimensional vector representations of social media users can benefit applications like recommendation systems and user attribute inference. Recent work has shown that user embeddings can be improved by combining different types of information, such as text and network data. We propose a data augmentation method that allows novel feature types to be used within off-the-shelf embedding models. Experimenting with the task of friend recommendation on a dataset of 5,019 Twitter users, we show that our approach can lead to substantial performance gains with the simple addition of network and geographic features.

pdf bib
Feature Selection as Causal Inference: Experiments with Text Classification
Michael J. Paul
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)

This paper proposes a matching technique for learning causal associations between word features and class labels in document classification. The goal is to identify more meaningful and generalizable features than with only correlational approaches. Experiments with sentiment classification show that the proposed method identifies interpretable word associations with sentiment and improves classification performance in a majority of cases. The proposed feature selection method is particularly effective when applied to out-of-domain data.

pdf bib
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Lucia Specia | Matt Post | Michael Paul
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

2016

pdf bib
Identifying and Categorizing Disaster-Related Tweets
Kevin Stowe | Michael J. Paul | Martha Palmer | Leysia Palen | Kenneth Anderson
Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media

pdf bib
Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation
Akiva Miura | Graham Neubig | Michael Paul | Satoshi Nakamura
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Sprite: Generalizing Topic Models with Structured Priors
Michael J. Paul | Mark Dredze
Transactions of the Association for Computational Linguistics, Volume 3

We introduce Sprite, a family of topic models that incorporates structure into model priors as a function of underlying components. The structured priors can be constrained to model topic hierarchies, factorizations, correlations, and supervision, allowing Sprite to be tailored to particular settings. We demonstrate this flexibility by constructing a Sprite-based model to jointly infer topic hierarchies and author perspective, which we apply to corpora of political debates and online reviews. We show that the model learns intuitive topics, outperforming several other topic models at predictive tasks.

2013

pdf bib
Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models
Michael J. Paul | Mark Dredze
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Separating Fact from Fear: Tracking Flu Infections on Twitter
Alex Lamb | Michael J. Paul | Mark Dredze
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Implicitly Intersecting Weighted Automata using Dual Decomposition
Michael J. Paul | Jason Eisner
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Unsupervised Part-of-Speech Tagging in Noisy and Esoteric Domains With a Syntactic-Semantic Bayesian HMM
William M. Darling | Michael J. Paul | Fei Song
Proceedings of the Workshop on Semantic Analysis in Social Media

pdf bib
The IWSLT 2011 Evaluation Campaign on Automatic Talk Translation
Marcello Federico | Sebastian Stüker | Luisa Bentivogli | Michael Paul | Mauro Cettolo | Teresa Herrmann | Jan Niehues | Giovanni Moretti
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We report here on the eighth evaluation campaign organized in 2011 by the IWSLT workshop series. That IWSLT 2011 evaluation focused on the automatic translation of public talks and included tracks for speech recognition, speech translation, text translation, and system combination. Unlike in previous years, all data supplied for the evaluation has been publicly released on the workshop website, and is at the disposal of researchers interested in working on our benchmarks and in comparing their results with those published at the workshop. This paper provides an overview of the IWSLT 2011 evaluation campaign, and describes the data supplied, the evaluation infrastructure made available to participants, and the subjective evaluation carried out.

pdf bib
Crowd-based MT Evaluation for non-English Target Languages
Michael Paul | Eiichiro Sumita | Luisa Bentivogli | Marcello Federico
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

pdf bib
Mixed Membership Markov Models for Unsupervised Conversation Modeling
Michael J. Paul
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Translation Quality Indicators for Pivot-based Statistical MT
Michael Paul | Eiichiro Sumita
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Getting Expert Quality from the Crowd for Machine Translation Evaluation
Luisa Bentivogli | Marcello Federico | Giovanni Moretti | Michael Paul
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Overview of the IWSLT 2011 evaluation campaign
Marcello Federico | Luisa Bentivogli | Michael Paul | Sebastian Stüker
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

We report here on the eighth Evaluation Campaign organized by the IWSLT workshop. This year, the IWSLT evaluation focused on the automatic translation of public talks and included tracks for speech recognition, speech translation, text translation, and system combination. Unlike previous years, all data supplied for the evaluation has been publicly released on the workshop website, and is at the disposal of researchers interested in working on our benchmarks and in comparing their results with those published at the workshop. This paper provides an overview of the IWSLT 2011 Evaluation Campaign, which includes: descriptions of the supplied data and evaluation specifications of each track, the list of participants specifying their submitted runs, a detailed description of the subjective evaluation carried out, the main findings of each exercise drawn from the results and the system descriptions prepared by the participants, and, finally, several detailed tables reporting all the evaluation results.

pdf bib
Dialect Translation: Integrating Bayesian Co-segmentation Models with Pivot-based SMT
Michael Paul | Andrew Finch | Paul R. Dixon | Eiichiro Sumita
Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties

2010

pdf bib
Integration of Multiple Bilingually-Learned Segmentation Schemes into Statistical Machine Translation
Michael Paul | Andrew Finch | Eiichiro Sumita
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Overview of the IWSLT 2010 evaluation campaign
Michael Paul | Marcello Federico | Sebastian Stüker
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper gives an overview of the evaluation campaign results of the 7th International Workshop on Spoken Language Translation (IWSLT 2010)1. This year, we focused on three spoken language tasks: (1) public speeches on a variety of topics (TALK) from English to French, (2) spoken dialog in travel situations (DIALOG) between Chinese and English, and (3) traveling expressions (BTEC) from Arabic, Turkish, and French to English. In total, 28 teams (including 7 firsttime participants) took part in the shared tasks, submitting 60 primary and 112 contrastive runs. Automatic and subjective evaluations of the primary runs were carried out in order to investigate the impact of different communication modalities, spoken language styles and semantic context on automatic speech recognition (ASR) and machine translation (MT) system performances.

pdf bib
The NICT translation system for IWSLT 2010
Chooi-Ling Goh | Taro Watanabe | Michael Paul | Andrew Finch | Eiichiro Sumita
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes NICT’s participation in the IWSLT 2010 evaluation campaign for the DIALOG translation (Chinese-English) and the BTEC (French-English) translation shared-tasks. For the DIALOG translation, the main challenge to this task is applying context information during translation. Context information can be used to decide on word choice and also to replace missing information during translation. We applied discriminative reranking using contextual information as additional features. In order to provide more choices for re-ranking, we generated n-best lists from multiple phrase-based statistical machine translation systems that varied in the type of Chinese word segmentation schemes used. We also built a model that merged the phrase tables generated by the different segmentation schemes. Furthermore, we used a lattice-based system combination model to combine the output from different systems. A combination of all of these systems was used to produce the n-best lists for re-ranking. For the BTEC task, a general approach that used latticebased system combination of two systems, a standard phrasebased system and a hierarchical phrase-based system, was taken. We also tried to process some unknown words by replacing them with the same words but different inflections that are known to the system.

pdf bib
Summarizing Contrastive Viewpoints in Opinionated Text
Michael Paul | ChengXiang Zhai | Roxana Girju
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Cross-Cultural Analysis of Blogs and Forums with Mixed-Collection Topic Models
Michael Paul | Roxana Girju
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
NICT@WMT09: Model Adaptation and Transliteration for Spanish-English SMT
Michael Paul | Andrew Finch | Eiichiro Sumita
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Mining the Web for Reciprocal Relationships
Michael Paul | Roxana Girju | Chen Li
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

pdf bib
Topic Modeling of Research Fields: An Interdisciplinary Perspective
Michael Paul | Roxana Girju
Proceedings of the International Conference RANLP-2009

pdf bib
On the Importance of Pivot Language Selection for Statistical Machine Translation
Michael Paul | Hirofumi Yamamoto | Eiichiro Sumita | Satoshi Nakamura
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

pdf bib
Overview of the IWSLT 2009 evaluation campaign
Michael Paul
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper gives an overview of the evaluation campaign results of the International1Workshop on Spoken Language Translation (IWSLT) 2009 . In this workshop, we focused on the translation of task-oriented human dialogs in travel situations. The speech data was recorded through human interpreters, where native speakers of different languages were asked to complete certain travel-related tasks like hotel reservations using their mother tongue. The translation of the freely-uttered conversation was carried out by human interpreters. The obtained speech data was annotated with dialog and speaker information. The translation directions were English into Chinese and vice versa for the Challenge Task, and Arabic, Chinese, and Turkish, which is a new edition, into English for the standard BTEC Task. In total, 18 research groups participated in this year’s event. Automatic and subjective evaluations were carried out in order to investigate the impact of task-oriented human dialogs on automatic speech recognition (ASR) and machine translation (MT) system performance, as well as the robustness of state-of-the-art MT systems for speech-to-speech translation in a dialog scenario.

pdf bib
Network-based speech-to-speech translation
Chiori Hori | Sakriani Sakti | Michael Paul | Noriyuki Kimura | Yutaka Ashikari | Ryosuke Isotani | Eiichiro Sumita | Satoshi Nakamura
Proceedings of the 6th International Workshop on Spoken Language Translation: Papers

This demo shows the network-based speech-to-speech translation system. The system was designed to perform realtime, location-free, multi-party translation between speakers of different languages. The spoken language modules: automatic speech recognition (ASR), machine translation (MT), and text-to-speech synthesis (TTS), are connected through Web servers that can be accessed via client applications worldwide. In this demo, we will show the multiparty speech-to-speech translation of Japanese, Chinese, Indonesian, Vietnamese, and English, provided by the NICT server. These speech-to-speech modules have been developed by NICT as a part of A-STAR (Asian Speech Translation Advanced Research) consortium project1.

2008

pdf bib
Overview of the IWSLT 2008 evaluation campaign.
Michael Paul
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper gives an overview of the evaluation campaign results of the International1Workshop on Spoken Language Translation (IWSLT) 2008 . In this workshop, we focused on the translation of spontaneous speech recorded in a real situation and the feasability of pivot-language-based translation approaches. The translation directions were English into Chinese and vice versa for the Challenge Task, Chinese into English and English into Spanish for the Pivot Task, and Arabic, Chinese, Spanish into English for the standard BTEC Task. In total, 19 research groups building 58 MT engines participated in this year’s event. Automatic and subjective evaluations were carried out in order to investigate the impact of spontaneity aspects of field data experiments on automatic speech recognition (ASR) and machine translation (MT) system performance as well as the robustness of state-of-the-art MT systems towards speech-to-speech translation in real environments.

pdf bib
The NICT/ATR speech translation system for IWSLT 2008.
Masao Utiyama | Andrew Finch | Hideo Okuma | Michael Paul | Hailong Cao | Hirofumi Yamamoto | Keiji Yasuda | Eiichiro Sumita
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the National Institute of Information and Communications Technology/Advanced Telecommunications Research Institute International (NICT/ATR) statistical machine translation (SMT) system used for the IWSLT 2008 evaluation campaign. We participated in the Chinese–English (Challenge Task), English–Chinese (Challenge Task), Chinese–English (BTEC Task), Chinese–Spanish (BTEC Task), and Chinese–English–Spanish (PIVOT Task) translation tasks. In the English–Chinese translation Challenge Task, we focused on exploring various factors for the English–Chinese translation because the research on the translation of English–Chinese is scarce compared to the opposite direction. In the Chinese–English translation Challenge Task, we employed a novel clustering method, where training sentences similar to the development data in terms of the word error rate formed a cluster. In the pivot translation task, we integrated two strategies for pivot translation by linear interpolation.

pdf bib
Improving statistical machine translation by paraphrasing the training data.
Francis Bond | Eric Nichols | Darren Scott Appling | Michael Paul
Proceedings of the 5th International Workshop on Spoken Language Translation: Papers

Large amounts of training data are essential for training statistical machine translations systems. In this paper we show how training data can be expanded by paraphrasing one side. The new data is made by parsing then generating using a precise HPSG based grammar, which gives sentences with the same meaning, but minor variations in lexical choice and word order. In experiments with Japanese and English, we showed consistent gains on the Tanaka Corpus with less consistent improvement on the IWSLT 2005 evaluation data.

pdf bib
Multilingual Mobile-Phone Translation Services for World Travelers
Michael Paul | Hideo Okuma | Hirofumi Yamamoto | Eiichiro Sumita | Shigeki Matsuda | Tohru Shimizu | Satoshi Nakamura
Coling 2008: Companion volume: Demonstrations

2007

pdf bib
Reducing human assessment of machine translation quality to binary classifiers
Michael Paul | Andrew Finch | Eiichiro Sumita
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

pdf bib
The NICT/ATR speech translation system for IWSLT 2007
Andrew Finch | Etienne Denoual | Hideo Okuma | Michael Paul | Hirofumi Yamamoto | Keiji Yasuda | Ruiqiang Zhang | Eiichiro Sumita
Proceedings of the Fourth International Workshop on Spoken Language Translation

This paper describes the NiCT-ATR statistical machine translation (SMT) system used for the IWSLT 2007 evaluation campaign. We participated in three of the four language pair translation tasks (CE, JE, and IE). We used a phrase-based SMT system using log-linear feature models for all tracks. This year we decoded from the ASR n-best lists in the JE track and found a gain in performance. We also applied some new techniques to facilitate the use of out-of-domain external resources by model combination and also by utilizing a huge corpus of n-grams provided by Google Inc.. Using these resources gave mixed results that depended on the technique also the language pair however, in some cases we achieved consistently positive results. The results from model-interpolation in particular were very promising.

2006

pdf bib
Exploiting Variant Corpora for Machine Translation
Michael Paul | Eiichiro Sumita
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Overview of the IWSLT06 evaluation campaign
Michael Paul
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
The NiCT-ATR statistical machine translation system for IWSLT 2006
Ruiqiang Zhang | Hirofumi Yamamoto | Michael Paul | Hideo Okuma | Keiji Yasuda | Yves Lepage | Etienne Denoual | Daichi Mochihashi | Andrew Finch | Eiichiro Sumita
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign

2005

pdf bib
Nobody is perfect: ATR’s hybrid approach to spoken language translation
Michael Paul | Takao Doi | Youngsook Hwang | Kenji Imamura | Hideo Okuma | Eiichiro Sumita
Proceedings of the Second International Workshop on Spoken Language Translation

pdf bib
A Machine Learning Approach to Hypotheses Selection of Greedy Decoding for SMT
Michael Paul | Eiichiro Sumita | Seiichi Yamamoto
Workshop on example-based machine translation

This paper proposes a method for integrating example-based and rule-based machine translation systems with statistical methods. It extends a greedy decoder for statistical machine translation (SMT), which searches for an optimal translation by using SMT models starting from a decoder seed, i.e., the source language input paired with an initial translation hypothesis. In order to reduce local optima problems inherent in the search, the outputs generated by multiple translation engines, such as rule-based (RBMT) and example-based (EBMT) systems, are utilized as the initial translation hypotheses. This method outperforms conventional greedy decoding approaches using initial translation hypotheses based on translation examples retrieved from a parallel text corpus. However, the decoding of multiple initial translation hypotheses is computationally expensive. This paper proposes a method to select a single initial translation hypothesis before decoding based on a machine learning approach that judges the appropriateness of multiple initial translation hypotheses and selects the most confident one for decoding. Our approach is evaluated for the translation of dialogues in the travel domain, and the results show that it drastically reduces computational costs without a loss in translation quality.

2004

pdf bib
Example-based Rescoring of Statistical Machine Translation Output
Michael Paul | Eiichiro Sumita | Seiichi Yamamoto
Proceedings of HLT-NAACL 2004: Short Papers

pdf bib
Overview of the IWSLT evaluation campaign
Yasuhiro Akiba | Marcello Federico | Noriko Kando | Hiromi Nakaiwa | Michael Paul | Jun’ichi Tsujii
Proceedings of the First International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
EBMT, SMT, hybrid and more: ATR spoken language translation system
Eiichiro Sumita | Yasuhiro Akiba | Takao Doi | Andrew Finch | Kenji Imamura | Hideo Okuma | Michael Paul | Mitsuo Shimohata | Taro Watanabe
Proceedings of the First International Workshop on Spoken Language Translation: Evaluation Campaign

2003

pdf bib
A corpus-centered approach to spoken language translation
Eiichiro Sumita | Yasuhiro Akiba | Takao Doi | Andrew Finch | Kenji Imamura | Michael Paul | Mitsuo Shimohata | Taro Watanabe
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf bib
Corpus-based Generation of Numeral Classifier using Phrase Alignment
Michael Paul | Eiichiro Sumita | Seiichi Yamamoto
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Translation knowledge recycling for related languages
Michael Paul
Proceedings of Machine Translation Summit VIII

An increasing interest in multi-lingual translation systems demands a reconsideration of the development costs of machine translation engines for language pairs. This paper proposes an approach that reuses the existing translation knowledge resources of high-quality translation engines for translation into different, but related languages. The lexical information of the target representation is utilized to generate the corresponding translation in the related language by using a transfer dictionary for the mapping of words and a set of heuristic rules for the mapping of structural information. Experiments using a Japanese-English translation engine for the generation of German translations show a minor decrease of up to 5% in the acceptability of the German output compared with the English translation of unseen Japanese input.

pdf bib
Integration of Referential Scope Limitations into Japanese Pronoun Resolution
Michael Paul | Eiichiro Sumita
Proceedings of the Second SIGdial Workshop on Discourse and Dialogue

1999

pdf bib
Solutions to problems inherent in spoken-language translation: the ATR-MATRIX approach
Eiichiro Sumita | Setsuo Yamada | Kazuhide Yamamoto | Michael Paul | Hideki Kashioka | Kai Ishikawa | Satoshi Shirai
Proceedings of Machine Translation Summit VII

ATR has built a multi-language speech translation system called ATR-MATRIX. It consists of a spoken-language translation subsystem, which is the focus of this paper, together with a highly accurate speech recognition subsystem and a high-definition speech synthesis subsystem. This paper gives a road map of solutions to the problems inherent in spoken-language translation. Spoken-language translation systems need to tackle difficult problems such as ungrammaticality. contextual phenomena, speech recognition errors, and the high-speeds required for real-time use. We have made great strides towards solving these problems in recent years. Our approach mainly uses an example-based translation model called TDMT. We have added the use of extra-linguistic information, a decision tree learning mechanism, and methods dealing with recognition errors.

pdf bib
Corpus-Based Anaphora Resolution Towards Antecedent Preference
Michael Paul | Kazuhide Yamamoto | Eiichiro Sumita
Coreference and Its Applications