Alexandre Allauzen


2024

pdf bib
LOCOST: State-Space Models for Long Document Abstractive Summarization
Florian Le Bronnec | Song Duong | Mathieu Ravaut | Alexandre Allauzen | Nancy Chen | Vincent Guigue | Alberto Lumbreras | Laure Soulier | Patrick Gallinari
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

State-space models are a low-complexity alternative to transformers for encoding long sequences and capturing long-term dependencies. We propose LOCOST: an encoder-decoder architecture based on state-space models for conditional text generation with long context inputs. With a computational complexity of đ’Ș(L log L), this architecture can handle significantly longer sequences than state-of-the-art models that are based on sparse attention patterns. We evaluate our model on a series of long document abstractive summarization tasks. The model reaches a performance level that is 93-96% comparable to the top-performing sparse transformers of the same size while saving up to 50% memory during training and up to 87% during inference. Additionally, LOCOST effectively handles input texts exceeding 600K tokens at inference time, setting new state-of-the-art results on full-book summarization and opening new perspectives for long input processing.

2023

pdf bib
Intégration de connaissances structurées par synthÚse de texte spécialisé
Guilhem Piat | Ellington Kirby | Julien Tourille | Nasredine Semmar | Alexandre Allauzen | Hassane Essafi
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : travaux de recherche originaux -- articles longs

Les modĂšles de langue de type Transformer peinent Ă  incorporer les modifications ayant pour but d’intĂ©grer des formats de donnĂ©es structurĂ©s non-textuels tels que les graphes de connaissances. Les exemples oĂč cette intĂ©gration est faite avec succĂšs requiĂšrent gĂ©nĂ©ralement que le problĂšme de dĂ©sambiguĂŻsation d’entitĂ©s nommĂ©es soit rĂ©solu en amont, ou bien l’ajout d’une quantitĂ© importante de texte d’entraĂźnement, gĂ©nĂ©ralement annotĂ©e. Ces contraintes rendent l’exploitation de connaissances structurĂ©es comme source de donnĂ©es difficile et parfois mĂȘme contre-productive. Nous cherchons Ă  adapter un modĂšle de langage au domaine biomĂ©dical en l’entraĂźnant sur du texte de synthĂšse issu d’un graphe de connaissances, de maniĂšre Ă  exploiter ces informations dans le cadre d’une modalitĂ© maĂźtrisĂ©e par le modĂšle de langage.

2021

pdf bib
Transport Optimal pour le Changement Sémantique à partir de Plongements Contextualisés (Optimal Transport for Semantic Change Detection using Contextualised Embeddings )
Syrielle Montariol | Alexandre Allauzen
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Plusieurs mĂ©thodes de dĂ©tection des changements sĂ©mantiques utilisant des plongements lexicaux contextualisĂ©s sont apparues rĂ©cemment. Elles permettent une analyse fine du changement d’usage des mots, en agrĂ©geant les plongements contextualisĂ©s en clusters qui reflĂštent les diffĂ©rents usages d’un mot. Nous proposons une nouvelle mĂ©thode basĂ©e sur le transport optimal. Nous l’évaluons sur plusieurs corpus annotĂ©s, montrant un gain de prĂ©cision par rapport aux autres mĂ©thodes utilisant des plongements contextualisĂ©s, et l’illustrons sur un corpus d’articles de journaux.

pdf bib
Measure and Evaluation of Semantic Divergence across Two Languages
Syrielle Montariol | Alexandre Allauzen
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Languages are dynamic systems: word usage may change over time, reflecting various societal factors. However, all languages do not evolve identically: the impact of an event, the influence of a trend or thinking, can differ between communities. In this paper, we propose to track these divergences by comparing the evolution of a word and its translation across two languages. We investigate several methods of building time-varying and bilingual word embeddings, using contextualised and non-contextualised embeddings. We propose a set of scenarios to characterize semantic divergence across two languages, along with a setup to differentiate them in a bilingual corpus. We evaluate the different methods by generating a corpus of synthetic semantic change across two languages, English and French, before applying them to newspaper corpora to detect bilingual semantic divergence and provide qualitative insight for the task. We conclude that BERT embeddings coupled with a clustering step lead to the best performance on synthetic corpora; however, the performance of CBOW embeddings is very competitive and more adapted to an exploratory analysis on a large corpus.

2020

pdf bib
Variations in Word Usage for the Financial Domain
Syrielle Montariol | Alexandre Allauzen | Asanobu Kitamoto
Proceedings of the Second Workshop on Financial Technology and Natural Language Processing

pdf bib
FlauBERT: Unsupervised Language Model Pre-training for French
Hang Le | Loïc Vial | Jibril Frej | Vincent Segonne | Maximin Coavoux | Benjamin Lecouteux | Alexandre Allauzen | Benoit Crabbé | Laurent Besacier | Didier Schwab
Proceedings of the Twelfth Language Resources and Evaluation Conference

Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient way to pre-train continuous word representations that can be fine-tuned for a downstream task, along with their contextualization at the sentence level. This has been widely demonstrated for English using contextualized representations (Dai and Le, 2015; Peters et al., 2018; Howard and Ruder, 2018; Radford et al., 2018; Devlin et al., 2019; Yang et al., 2019b). In this paper, we introduce and share FlauBERT, a model learned on a very large and heterogeneous French corpus. Models of different sizes are trained using the new CNRS (French National Centre for Scientific Research) Jean Zay supercomputer. We apply our French language models to diverse NLP tasks (text classification, paraphrasing, natural language inference, parsing, word sense disambiguation) and show that most of the time they outperform other pre-training approaches. Different versions of FlauBERT as well as a unified evaluation protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research community for further reproducible experiments in French NLP.

pdf bib
FlauBERT : des modÚles de langue contextualisés pré-entraßnés pour le français (FlauBERT : Unsupervised Language Model Pre-training for French)
Hang Le | Loïc Vial | Jibril Frej | Vincent Segonne | Maximin Coavoux | Benjamin Lecouteux | Alexandre Allauzen | Benoßt Crabbé | Laurent Besacier | Didier Schwab
Actes de la 6e confĂ©rence conjointe JournĂ©es d'Études sur la Parole (JEP, 33e Ă©dition), Traitement Automatique des Langues Naturelles (TALN, 27e Ă©dition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e Ă©dition). Volume 2 : Traitement Automatique des Langues Naturelles

Les modĂšles de langue prĂ©-entraĂźnĂ©s sont dĂ©sormais indispensables pour obtenir des rĂ©sultats Ă  l’état-de-l’art dans de nombreuses tĂąches du TALN. Tirant avantage de l’énorme quantitĂ© de textes bruts disponibles, ils permettent d’extraire des reprĂ©sentations continues des mots, contextualisĂ©es au niveau de la phrase. L’efficacitĂ© de ces reprĂ©sentations pour rĂ©soudre plusieurs tĂąches de TALN a Ă©tĂ© dĂ©montrĂ©e rĂ©cemment pour l’anglais. Dans cet article, nous prĂ©sentons et partageons FlauBERT, un ensemble de modĂšles appris sur un corpus français hĂ©tĂ©rogĂšne et de taille importante. Des modĂšles de complexitĂ© diffĂ©rente sont entraĂźnĂ©s Ă  l’aide du nouveau supercalculateur Jean Zay du CNRS. Nous Ă©valuons nos modĂšles de langue sur diverses tĂąches en français (classification de textes, paraphrase, infĂ©rence en langage naturel, analyse syntaxique, dĂ©sambiguĂŻsation automatique) et montrons qu’ils surpassent souvent les autres approches sur le rĂ©fĂ©rentiel d’évaluation FLUE Ă©galement prĂ©sentĂ© ici.

pdf bib
Étude des variations sĂ©mantiques Ă  travers plusieurs dimensions (Studying semantic variations through several dimensions )
Syrielle Montariol | Alexandre Allauzen
Actes de la 6e confĂ©rence conjointe JournĂ©es d'Études sur la Parole (JEP, 33e Ă©dition), Traitement Automatique des Langues Naturelles (TALN, 27e Ă©dition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e Ă©dition). Volume 2 : Traitement Automatique des Langues Naturelles

Au sein d’une langue, l’usage des mots varie selon deux axes : diachronique (dimension temporelle) et synchronique (variation selon l’auteur, la communautĂ©, la zone gĂ©ographique... ). Dans ces travaux, nous proposons une mĂ©thode de dĂ©tection et d’interprĂ©tation des variations d’usages des mots Ă  travers ces diffĂ©rentes dimensions. Pour cela, nous exploitons les capacitĂ©s d’une nouvelle ligne de plongements lexicaux contextualisĂ©s, en particulier le modĂšle BERT. Nous expĂ©rimentons sur un corpus de rapports financiers d’entreprises françaises, pour apprĂ©hender les enjeux et prĂ©occupations propres Ă  certaines pĂ©riodes, acteurs et secteurs d’activitĂ©s.

2019

pdf bib
Empirical Study of Diachronic Word Embeddings for Scarce Data
Syrielle Montariol | Alexandre Allauzen
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

Word meaning change can be inferred from drifts of time-varying word embeddings. However, temporal data may be too sparse to build robust word embeddings and to discriminate significant drifts from noise. In this paper, we compare three models to learn diachronic word embeddings on scarce data: incremental updating of a Skip-Gram from Kim et al. (2014), dynamic filtering from Bamler & Mandt (2017), and dynamic Bernoulli embeddings from Rudolph & Blei (2018). In particular, we study the performance of different initialisation schemes and emphasise what characteristics of each model are more suitable to data scarcity, relying on the distribution of detected drifts. Finally, we regularise the loss of these models to better adapt to scarce data.

pdf bib
Apprentissage de plongements de mots dynamiques avec régularisation de la dérive (Learning dynamic word embeddings with drift regularisation)
Syrielle Montariol | Alexandre Allauzen
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume I : Articles longs

L’usage, le sens et la connotation des mots peuvent changer au cours du temps. Les plongements lexicaux diachroniques permettent de modĂ©liser ces changements de maniĂšre non supervisĂ©e. Dans cet article nous Ă©tudions l’impact de plusieurs fonctions de coĂ»t sur l’apprentissage de plongements dynamiques, en comparant les comportements de variantes du modĂšle Dynamic Bernoulli Embeddings. Les plongements dynamiques sont estimĂ©s sur deux corpus couvrant les mĂȘmes deux dĂ©cennies, le New York Times Annotated Corpus en anglais et une sĂ©lection d’articles du journal Le Monde en français, ce qui nous permet de mettre en place un processus d’analyse bilingue de l’évolution de l’usage des mots.

pdf bib
Exploring sentence informativeness
Syrielle Montariol | Aina GarĂ­ Soler | Alexandre Allauzen
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

This study is a preliminary exploration of the concept of informativeness –how much information a sentence gives about a word it contains– and its potential benefits to building quality word representations from scarce data. We propose several sentence-level classifiers to predict informativeness, and we perform a manual annotation on a set of sentences. We conclude that these two measures correspond to different notions of informativeness. However, our experiments show that using the classifiers’ predictions to train word embeddings has an impact on embedding quality.

pdf bib
Word Usage Similarity Estimation with Sentence Representations and Automatic Substitutes
Aina GarĂ­ Soler | Marianna Apidianaki | Alexandre Allauzen
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Usage similarity estimation addresses the semantic proximity of word instances in different contexts. We apply contextualized (ELMo and BERT) word and sentence embeddings to this task, and propose supervised models that leverage these representations for prediction. Our models are further assisted by lexical substitute annotations automatically assigned to word instances by context2vec, a neural model that relies on a bidirectional LSTM. We perform an extensive comparison of existing word and sentence representations on benchmark datasets addressing both graded and binary similarity. The best performing models outperform previous methods in both settings.

pdf bib
LIMSI-MULTISEM at the IJCAI SemDeep-5 WiC Challenge: Context Representations for Word Usage Similarity Estimation
Aina GarĂ­ Soler | Marianna Apidianaki | Alexandre Allauzen
Proceedings of the 5th Workshop on Semantic Deep Learning (SemDeep-5)

2018

bib
Traitement Automatique des Langues, Volume 59, Numéro 2 : Apprentissage profond pour le traitement automatique des langues [Deep Learning for natural language processing]
Alexandre Allauzen | Hinrich SchĂŒtze
Traitement Automatique des Langues, Volume 59, Numéro 2 : Apprentissage profond pour le traitement automatique des langues [Deep Learning for natural language processing]

pdf bib
Apprentissage profond pour le traitement automatique des langues [Deep Learning for Natural Language Processing]
Alexandre Allauzen | Hinrich SchĂŒtze
Traitement Automatique des Langues, Volume 59, Numéro 2 : Apprentissage profond pour le traitement automatique des langues [Deep Learning for natural language processing]

pdf bib
Learning with Noise-Contrastive Estimation: Easing training by learning to scale
Matthieu Labeau | Alexandre Allauzen
Proceedings of the 27th International Conference on Computational Linguistics

Noise-Contrastive Estimation (NCE) is a learning criterion that is regularly used to train neural language models in place of Maximum Likelihood Estimation, since it avoids the computational bottleneck caused by the output softmax. In this paper, we analyse and explain some of the weaknesses of this objective function, linked to the mechanism of self-normalization, by closely monitoring comparative experiments. We then explore several remedies and modifications to propose tractable and efficient NCE training strategies. In particular, we propose to make the scaling factor a trainable parameter of the model, and to use the noise distribution to initialize the output bias. These solutions, yet simple, yield stable and competitive performances in either small and large scale language modelling tasks.

pdf bib
Algorithmes Ă  base d’échantillonage pour l’entraĂźnement de modĂšles de langue neuronaux (Here the title in English)
Matthieu Labeau | Alexandre Allauzen
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

L’estimation contrastive bruitĂ©e (NCE) et l’échantillonage par importance (IS) sont des procĂ©dures d’entraĂźnement basĂ©es sur l’échantillonage, que l’on utilise habituellement Ă  la place de l’estimation du maximum de vraisemblance (MLE) pour Ă©viter le calcul du softmax lorsque l’on entraĂźne des modĂšles de langue neuronaux. Dans cet article, nous cherchons Ă  rĂ©sumer le fonctionnement de ces algorithmes, et leur utilisation dans la littĂ©rature du TAL. Nous les comparons expĂ©rimentalement, et prĂ©sentons des maniĂšres de faciliter l’entraĂźnement du NCE.

pdf bib
A comparative study of word embeddings and other features for lexical complexity detection in French
Aina GarĂ­ Soler | Marianna Apidianaki | Alexandre Allauzen
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Lexical complexity detection is an important step for automatic text simplification which serves to make informed lexical substitutions. In this study, we experiment with word embeddings for measuring the complexity of French words and combine them with other features that have been shown to be well-suited for complexity prediction. Our results on a synonym ranking task show that embeddings perform better than other features in isolation, but do not outperform frequency-based systems in this language.

2017

pdf bib
Character and Subword-Based Word Representation for Neural Language Modeling Prediction
Matthieu Labeau | Alexandre Allauzen
Proceedings of the First Workshop on Subword and Character Level Models in NLP

Most of neural language models use different kinds of embeddings for word prediction. While word embeddings can be associated to each word in the vocabulary or derived from characters as well as factored morphological decomposition, these word representations are mainly used to parametrize the input, i.e. the context of prediction. This work investigates the effect of using subword units (character and factored morphological decomposition) to build output representations for neural language modeling. We present a case study on Czech, a morphologically-rich language, experimenting with different input and output representations. When working with the full training vocabulary, despite unstable training, our experiments show that augmenting the output word representations with character-based embeddings can significantly improve the performance of the model. Moreover, reducing the size of the output look-up table, to let the character-based embeddings represent rare words, brings further improvement.

pdf bib
LIMSI@WMT’17
Franck Burlot | Pooyan Safari | Matthieu Labeau | Alexandre Allauzen | François Yvon
Proceedings of the Second Conference on Machine Translation

pdf bib
An experimental analysis of Noise-Contrastive Estimation: the noise distribution matters
Matthieu Labeau | Alexandre Allauzen
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Noise Contrastive Estimation (NCE) is a learning procedure that is regularly used to train neural language models, since it avoids the computational bottleneck caused by the output softmax. In this paper, we attempt to explain some of the weaknesses of this objective function, and to draw directions for further developments. Experiments on a small task show the issues raised by an unigram noise distribution, and that a context dependent noise distribution, such as the bigram distribution, can solve these issues and provide stable and data-efficient learning.

pdf bib
Représentations continues dérivées des caractÚres pour un modÚle de langue neuronal à vocabulaire ouvert (Opening the vocabulary of neural language models with character-level word representations)
Matthieu Labeau | Alexandre Allauzen
Actes des 24Úme Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 - Articles longs

Cet article propose une architecture neuronale pour un modĂšle de langue Ă  vocabulaire ouvert. Les reprĂ©sentations continues des mots sont calculĂ©es Ă  la volĂ©e Ă  partir des caractĂšres les composant, grĂ ce Ă  une couche convolutionnelle suivie d’une couche de regroupement (pooling). Cela permet au modĂšle de reprĂ©senter n’importe quel mot, qu’il fasse partie du contexte ou soit Ă©valuĂ© pour la prĂ©diction. La fonction objectif est dĂ©rivĂ©e de l’estimation contrastive bruitĂ©e (Noise Contrastive Estimation, ou NCE), calculable dans notre cas sans vocabulaire. Nous Ă©valuons la capacitĂ© de notre modĂšle Ă  construire des reprĂ©sentations continues de mots inconnus sur la tĂąche de traduction automatique IWSLT-2016, de l’Anglais vers le TchĂšque, en rĂ©-Ă©valuant les N meilleures hypothĂšses (N-best reranking). Les rĂ©sultats expĂ©rimentaux permettent des gains jusqu’à 0,7 point BLEU. Ils montrent aussi la difficultĂ© d’utiliser des reprĂ©sentations dĂ©rivĂ©es des caractĂšres pour la prĂ©diction.

pdf bib
Adaptation au domaine pour l’analyse morpho-syntaxique (Domain Adaptation for PoS tagging)
ÉlĂ©onor Bartenlian | Margot Lacour | Matthieu Labeau | Alexandre Allauzen | Guillaume Wisniewski | François Yvon
Actes des 24Úme Conférence sur le Traitement Automatique des Langues Naturelles. Volume 2 - Articles courts

Ce travail cherche Ă  comprendre pourquoi les performances d’un analyseur morpho-syntaxiques chutent fortement lorsque celui-ci est utilisĂ© sur des donnĂ©es hors domaine. Nous montrons Ă  l’aide d’une expĂ©rience jouet que ce comportement peut ĂȘtre dĂ» Ă  un phĂ©nomĂšne de masquage des caractĂ©ristiques lexicalisĂ©es par les caractĂ©ristiques non lexicalisĂ©es. Nous proposons plusieurs modĂšles essayant de rĂ©duire cet effet.

2016

pdf bib
LIMSI@WMT’16: Machine Translation of News
Alexandre Allauzen | Lauriane Aufrant | Franck Burlot | Ophélie Lacroix | Elena Knyazeva | Thomas Lavergne | Guillaume Wisniewski | François Yvon
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
The Karlsruhe Institute of Technology Systems for the News Translation Task in WMT 2016
Thanh-Le Ha | Eunah Cho | Jan Niehues | Mohammed Mediani | Matthias Sperber | Alexandre Allauzen | Alexander Waibel
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
The QT21/HimL Combined Machine Translation System
Jan-Thorsten Peter | Tamer Alkhouli | Hermann Ney | Matthias Huck | Fabienne Braune | Alexander Fraser | AleĆĄ Tamchyna | Ondƙej Bojar | Barry Haddow | Rico Sennrich | FrĂ©dĂ©ric Blain | Lucia Specia | Jan Niehues | Alex Waibel | Alexandre Allauzen | Lauriane Aufrant | Franck Burlot | Elena Knyazeva | Thomas Lavergne | François Yvon | Mārcis Pinnis | Stella Frank
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf bib
Une mĂ©thode non-supervisĂ©e pour la segmentation morphologique et l’apprentissage de morphotactique Ă  l’aide de processus de Pitman-Yor (An unsupervised method for joint morphological segmentation and morphotactics learning using Pitman-Yor processes)
Kevin Löser | Alexandre Allauzen
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Articles longs)

Cet article prĂ©sente un modĂšle bayĂ©sien non-paramĂ©trique pour la segmentation morphologique non supervisĂ©e. Ce modĂšle semi-markovien s’appuie sur des classes latentes de morphĂšmes afin de modĂ©liser les caractĂ©ristiques morphotactiques du lexique, et son caractĂšre non-paramĂ©trique lui permet de s’adapter aux donnĂ©es sans avoir Ă  spĂ©cifier Ă  l’avance l’inventaire des morphĂšmes ainsi que leurs classes. Un processus de Pitman-Yor est utilisĂ© comme a priori sur les paramĂštres afin d’éviter une convergence vers des solutions dĂ©gĂ©nĂ©rĂ©es et inadaptĂ©es au traitemement automatique des langues. Les rĂ©sultats expĂ©rimentaux montrent la pertinence des segmentations obtenues pour le turc et l’anglais. Une Ă©tude qualitative montre Ă©galement que le modĂšle infĂšre une morphotactique linguistiquement pertinente, sans le recours Ă  des connaissances expertes quant Ă  la structure morphologique des formes de mots.

pdf bib
LIMSI@IWSLT’16: MT Track
Franck Burlot | Matthieu Labeau | Elena Knyazeva | Thomas Lavergne | Alexandre Allauzen | François Yvon
Proceedings of the 13th International Conference on Spoken Language Translation

This paper describes LIMSI’s submission to the MT track of IWSLT 2016. We report results for translation from English into Czech. Our submission is an attempt to address the difficulties of translating into a morphologically rich language by paying special attention to the morphology generation on target side. To this end, we propose two ways of improving the morphological fluency of the output: 1. by performing translation and inflection of the target language in two separate steps, and 2. by using a neural language model with characted-based word representation. We finally present the combination of both methods used for our primary system submission.

pdf bib
Apprentissage discriminant de modĂšles neuronaux pour la traduction automatique [Discriminative training of continuous space translation models]
Quoc-Khanh Do | Alexandre Allauzen | François Yvon
Traitement Automatique des Langues, Volume 57, Numéro 1 : Varia [Varia]

2015

pdf bib
Oublier ce qu’on sait, pour mieux apprendre ce qu’on ne sait pas : une Ă©tude sur les contraintes de type dans les modĂšles CRF
Nicolas Pécheux | Alexandre Allauzen | Thomas Lavergne | Guillaume Wisniewski | François Yvon
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Quand on dispose de connaissances a priori sur les sorties possibles d’un problĂšme d’étiquetage, il semble souhaitable d’inclure cette information lors de l’apprentissage pour simplifier la tĂąche de modĂ©lisation et accĂ©lĂ©rer les traitements. Pourtant, mĂȘme lorsque ces contraintes sont correctes et utiles au dĂ©codage, leur utilisation lors de l’apprentissage peut dĂ©grader sĂ©vĂšrement les performances. Dans cet article, nous Ă©tudions ce paradoxe et montrons que le manque de contraste induit par les connaissances entraĂźne une forme de sous-apprentissage qu’il est cependant possible de limiter.

pdf bib
Apprentissage discriminant des modĂšles continus de traduction
Quoc-Khanh Do | Alexandre Allauzen | François Yvon
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Alors que les rĂ©seaux neuronaux occupent une place de plus en plus importante dans le traitement automatique des langues, les mĂ©thodes d’apprentissage actuelles utilisent pour la plupart des critĂšres qui sont dĂ©corrĂ©lĂ©s de l’application. Cet article propose un nouveau cadre d’apprentissage discriminant pour l’estimation des modĂšles continus de traduction. Ce cadre s’appuie sur la dĂ©finition d’un critĂšre d’optimisation permettant de prendre en compte d’une part la mĂ©trique utilisĂ©e pour l’évaluation de la traduction et d’autre part l’intĂ©gration de ces modĂšles au sein des systĂšmes de traduction automatique. De plus, cette mĂ©thode d’apprentissage est comparĂ©e aux critĂšres existants d’estimation que sont le maximum de vraisemblance et l’estimation contrastive bruitĂ©e. Les expĂ©riences menĂ©es sur la tĂąches de traduction des sĂ©minaires TED Talks de l’anglais vers le français montrent la pertinence d’un cadre discriminant d’apprentissage, dont les performances restent toutefois trĂšs dĂ©pendantes du choix d’une stratĂ©gie d’initialisation idoine. Nous montrons qu’avec une initialisation judicieuse des gains significatifs en termes de scores BLEU peuvent ĂȘtre obtenus.

pdf bib
Non-lexical neural architecture for fine-grained POS Tagging
Matthieu Labeau | Kevin Löser | Alexandre Allauzen
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
A Discriminative Training Procedure for Continuous Translation Models
Quoc-Khanh Do | Alexandre Allauzen | François Yvon
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
The KIT-LIMSI Translation System for WMT 2015
Thanh-Le Ha | Quoc-Khanh Do | Eunah Cho | Jan Niehues | Alexandre Allauzen | François Yvon | Alex Waibel
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
LIMSI@WMT’15 : Translation Task
Benjamin Marie | Alexandre Allauzen | Franck Burlot | Quoc-Khanh Do | Julia Ive | Elena Knyazeva | Matthieu Labeau | Thomas Lavergne | Kevin Löser | Nicolas Pécheux | François Yvon
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
ListNet-based MT Rescoring
Jan Niehues | Quoc Khanh Do | Alexandre Allauzen | Alex Waibel
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality
Alexandre Allauzen | Edward Grefenstette | Karl Moritz Hermann | Hugo Larochelle | Scott Wen-tau Yih
Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality

2014

pdf bib
Cross-Lingual POS Tagging through Ambiguous Learning: First Experiments (Apprentissage partiellement supervisĂ© d’un Ă©tiqueteur morpho-syntaxique par transfert cross-lingue) [in French]
Guillaume Wisniewski | Nicolas Pécheux | Elena Knyazeva | Alexandre Allauzen | François Yvon
Proceedings of TALN 2014 (Volume 1: Long Papers)

pdf bib
Comparison of scheduling methods for the learning rate of neural network language models (ModĂšles de langue neuronaux: une comparaison de plusieurs stratĂ©gies d’apprentissage) [in French]
Quoc-Khanh Do | Alexandre Allauzen | François Yvon
Proceedings of TALN 2014 (Volume 1: Long Papers)

pdf bib
LIMSI English-French speech translation system
Natalia Segal | HélÚne Bonneau-Maynard | Quoc Khanh Do | Alexandre Allauzen | Jean-Luc Gauvain | Lori Lamel | François Yvon
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper documents the systems developed by LIMSI for the IWSLT 2014 speech translation task (English→French). The main objective of this participation was twofold: adapting different components of the ASR baseline system to the peculiarities of TED talks and improving the machine translation quality on the automatic speech recognition output data. For the latter task, various techniques have been considered: punctuation and number normalization, adaptation to ASR errors, as well as the use of structured output layer neural network models for speech data.

pdf bib
Discriminative adaptation of continuous space translation models
Quoc-Khanh Do | Alexandre Allauzen | François Yvon
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

In this paper we explore various adaptation techniques for continuous space translation models (CSTMs). We consider the following practical situation: given a large scale, state-of-the-art SMT system containing a CSTM, the task is to adapt the CSTM to a new domain using a (relatively) small in-domain parallel corpus. Our method relies on the definition of a new discriminative loss function for the CSTM that borrows from both the max-margin and pair-wise ranking approaches. In our experiments, the baseline out-of-domain SMT system is initially trained for the WMT News translation task, and the CSTM is to be adapted to the lecture translation task as defined by IWSLT evaluation campaign. Experimental results show that an improvement of 1.5 BLEU points can be achieved with the proposed adaptation method.

pdf bib
Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC)
Alexandre Allauzen | Raffaella Bernardi | Edward Grefenstette | Hugo Larochelle | Christopher Manning | Scott Wen-tau Yih
Proceedings of the 2nd Workshop on Continuous Vector Space Models and their Compositionality (CVSC)

2013

pdf bib
A fully discriminative training framework for Statistical Machine Translation (Un cadre d’apprentissage intĂ©gralement discriminant pour la traduction statistique) [in French]
Thomas Lavergne | Alexandre Allauzen | François Yvon
Proceedings of TALN 2013 (Volume 1: Long Papers)

pdf bib
Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality
Alexandre Allauzen | Hugo Larochelle | Christopher Manning | Richard Socher
Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality

2012

pdf bib
Continuous Space Translation Models with Neural Networks
Hai Son Le | Alexandre Allauzen | François Yvon
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Measuring the Influence of Long Range Dependencies with Neural Network Language Models
Hai Son Le | Alexandre Allauzen | François Yvon
Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT

pdf bib
Joint WMT 2012 Submission of the QUAERO Project
Markus Freitag | Stephan Peitz | Matthias Huck | Hermann Ney | Jan Niehues | Teresa Herrmann | Alex Waibel | Hai-son Le | Thomas Lavergne | Alexandre Allauzen | Bianka Buschbeck | Josep Maria Crego | Jean Senellart
Proceedings of the Seventh Workshop on Statistical Machine Translation

pdf bib
LIMSI @ WMT12
Hai-Son Le | Thomas Lavergne | Alexandre Allauzen | Marianna Apidianaki | Li Gong | Aurélien Max | Artem Sokolov | Guillaume Wisniewski | François Yvon
Proceedings of the Seventh Workshop on Statistical Machine Translation

2011

pdf bib
Estimation d’un modĂšle de traduction Ă  partir d’alignements mot-Ă -mot non-dĂ©terministes (Estimating a translation model from non-deterministic word-to-word alignments)
Nadi Tomeh | Alexandre Allauzen | François Yvon
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans les systĂšmes de traduction statistique Ă  base de segments, le modĂšle de traduction est estimĂ© Ă  partir d’alignements mot-Ă -mot grĂące Ă  des heuristiques d’extraction et de valuation. Bien que ces alignements mot-Ă -mot soient construits par des modĂšles probabilistes, les processus d’extraction et de valuation utilisent ces modĂšles en faisant l’hypothĂšse que ces alignements sont dĂ©terministes. Dans cet article, nous proposons de lever cette hypothĂšse en considĂ©rant l’ensemble de la matrice d’alignement, d’une paire de phrases, chaque association Ă©tant valuĂ©e par sa probabilitĂ©. En comparaison avec les travaux antĂ©rieurs, nous montrons qu’en utilisant un modĂšle exponentiel pour estimer de maniĂšre discriminante ces probabilitĂ©s, il est possible d’obtenir des amĂ©liorations significatives des performances de traduction. Ces amĂ©liorations sont mesurĂ©es Ă  l’aide de la mĂ©trique BLEU sur la tĂąche de traduction de l’arabe vers l’anglais de l’évaluation NIST MT’09, en considĂ©rant deux types de conditions selon la taille du corpus de donnĂ©es parallĂšles utilisĂ©es.

pdf bib
Discriminative Weighted Alignment Matrices For Statistical Machine Translation
Nadi Tomeh | Alexandre Allauzen | François Yvon
Proceedings of the 15th Annual Conference of the European Association for Machine Translation

pdf bib
LIMSI’s experiments in domain adaptation for IWSLT11
Thomas Lavergne | Alexandre Allauzen | Hai-Son Le | François Yvon
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign

LIMSI took part in the IWSLT 2011 TED task in the MT track for English to French using the in-house n-code system, which implements the n-gram based approach to Machine Translation. This framework not only allows to achieve state-of-the-art results for this language pair, but is also appealing due to its conceptual simplicity and its use of well understood statistical language models. Using this approach, we compare several ways to adapt our existing systems and resources to the TED task with mixture of language models and try to provide an analysis of the modest gains obtained by training a log linear combination of inand out-of-domain models.

pdf bib
How good are your phrases? Assessing phrase quality with single class classification
Nadi Tomeh | Marco Turchi | Guillaume Wisinewski | Alexandre Allauzen | François Yvon
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

We present a novel translation quality informed procedure for both extraction and scoring of phrase pairs in PBSMT systems. We reformulate the extraction problem in the supervised learning framework. Our goal is twofold. First, We attempt to take the translation quality into account; and second we incorporating arbitrary features in order to circumvent alignment errors. One-Class SVMs and the Mapping Convergence algorithm permit training a single-class classifier to discriminate between useful and useless phrase pairs. Such classifier can be learned from a training corpus that comprises only useful instances. The confidence score, produced by the classifier for each phrase pairs, is employed as a selection criteria. The smoothness of these scores allow a fine control over the size of the resulting translation model. Finally, confidence scores provide a new accuracy-based feature to score phrase pairs. Experimental evaluation of the method shows accurate assessments of phrase pairs quality even for regions in the space of possible phrase pairs that are ignored by other approaches. This enhanced evaluation of phrase pairs leads to improvements in the translation performance as measured by BLEU.

pdf bib
LIMSI @ WMT11
Alexandre Allauzen | HélÚne Bonneau-Maynard | Hai-Son Le | Aurélien Max | Guillaume Wisniewski | François Yvon | Gilles Adda | Josep Maria Crego | Adrien Lardilleux | Thomas Lavergne | Artem Sokolov
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
Joint WMT Submission of the QUAERO Project
Markus Freitag | Gregor Leusch | Joern Wuebker | Stephan Peitz | Hermann Ney | Teresa Herrmann | Jan Niehues | Alex Waibel | Alexandre Allauzen | Gilles Adda | Josep Maria Crego | Bianka Buschbeck | Tonio Wandmacher | Jean Senellart
Proceedings of the Sixth Workshop on Statistical Machine Translation

pdf bib
From n-gram-based to CRF-based Translation Models
Thomas Lavergne | Alexandre Allauzen | Josep Maria Crego | François Yvon
Proceedings of the Sixth Workshop on Statistical Machine Translation

2010

pdf bib
LIMSI’s Statistical Translation Systems for WMT’10
Alexandre Allauzen | Josep M. Crego | İlknur Durgar El-Kahlout | François Yvon
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
LIMSI @ IWSLT 2010
Alexandre Allauzen | Josep M. Crego | İlknur Durgar El-Kahlout | Le Hai-Son | Guillaume Wisniewski | François Yvon
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes LIMSI’s Statistical Machine Translation systems (SMT) for the IWSLT evaluation, where we participated in two tasks (Talk for English to French and BTEC for Turkish to English). For the Talk task, we studied an extension of our in-house n-code SMT system (the integration of a bilingual reordering model over generalized translation units), as well as the use of training data extracted from Wikipedia in order to adapt the target language model. For the BTEC task, we concentrated on pre-processing schemes on the Turkish side in order to reduce the morphological discrepancies with the English side. We also evaluated the use of two different continuous space language models for such a small size of training data.

pdf bib
Training Continuous Space Language Models: Some Practical Issues
Hai Son Le | Alexandre Allauzen | Guillaume Wisniewski | François Yvon
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Assessing Phrase-Based Translation Models with Oracle Decoding
Guillaume Wisniewski | Alexandre Allauzen | François Yvon
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Refining Word Alignment with Discriminative Training
Nadi Tomeh | Alexandre Allauzen | François Yvon | Guillaume Wisniewski
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

The quality of statistical machine translation systems depends on the quality of the word alignments that are computed during the translation model training phase. IBM alignment models, as implemented in the GIZA++ toolkit, constitute the de facto standard for performing these computations. The resulting alignments and translation models are however very noisy, and several authors have tried to improve them. In this work, we propose a simple and effective approach, which considers alignment as a series of independent binary classification problems in the alignment matrix. Through extensive feature engineering and the use of stacking techniques, we were able to obtain alignments much closer to manually defined references than those obtained by the IBM models. These alignments also yield better translation models, delivering improved performance in a large scale Arabic to English translation task.

2009

pdf bib
LIMSI‘s Statistical Translation Systems for WMT‘09
Alexandre Allauzen | Josep Crego | Aurélien Max | François Yvon
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf bib
Modùles discriminants pour l’alignement mot à mot [Discriminant Models for Word Alignment]
Alexandre Allauzen | Guillaume Wisniewski
Traitement Automatique des Langues, Volume 50, Numéro 3 : Apprentissage automatique pour le TAL [Machine Learning for NLP]

2008

pdf bib
Training and Evaluation of POS Taggers on the French MULTITAG Corpus
Alexandre Allauzen | HĂ©lĂšne Bonneau-Maynard
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The explicit introduction of morphosyntactic information into statistical machine translation approaches is receiving an important focus of attention. The current freely available Part of Speech (POS) taggers for the French language are based on a limited tagset which does not account for some flectional particularities. Moreover, there is a lack of a unified framework of training and evaluation for these kinds of linguistic resources. Therefore in this paper, three standard POS taggers (Treetagger, Brill’s tagger and the standard HMM POS tagger) are trained and evaluated in the same conditions on the French MULTITAG corpus. This POS-tagged corpus provides a tagset richer than the usual ones, including gender and number distinctions, for example. Experimental results show significant differences of performance between the taggers. According to the tagging accuracy estimated with a tagset of 300 items, taggers may be ranked as follows: Treetagger (95.7%), Brill’s tagger (94.6%), HMM tagger (93.4%). Examples of translation outputs illustrate how considering gender and number distinctions in the POS tagset can be relevant.

pdf bib
Limsi’s Statistical Translation Systems for WMT‘08
Daniel Déchelotte | Gilles Adda | Alexandre Allauzen | HélÚne Bonneau-Maynard | Olivier Galibert | Jean-Luc Gauvain | Philippe Langlais | François Yvon
Proceedings of the Third Workshop on Statistical Machine Translation

2007

pdf bib
A state-of-the-art statistical machine translation system based on Moses
Daniel DĂ©chelotte | Holger Schwenk | HĂ©lĂšne Bonneau-Maynard | Alexandre Allauzen | Gilles Adda
Proceedings of Machine Translation Summit XI: Papers

pdf bib
Combining Morphosyntactic Enriched Representation with n-best Reranking in Statistical Translation
HĂ©lĂšne Bonneau-Maynard | Alexandre Allauzen | Daniel DĂ©chelotte | Holger Schwenk
Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation

pdf bib
ModĂšles statistiques enrichis par la syntaxe pour la traduction automatique
Holger Schwenk | Daniel DĂ©chelotte | HĂ©lĂšne Bonneau-Maynard | Alexandre Allauzen
Actes de la 14Úme conférence sur le Traitement Automatique des Langues Naturelles. Posters

La traduction automatique statistique par séquences de mots est une voie prometteuse. Nous présentons dans cet article deux évolutions complémentaires. La premiÚre permet une modélisation de la langue cible dans un espace continu. La seconde intÚgre des catégories morpho-syntaxiques aux unités manipulées par le modÚle de traduction. Ces deux approches sont évaluées sur la tùche Tc-Star. Les résultats les plus intéressants sont obtenus par la combinaison de ces deux méthodes.
Search
Co-authors