Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024)

Raúl Vázquez, Timothee Mickus, Jörg Tiedemann, Ivan Vulić, Ahmet Üstün (Editors)

Anthology ID:: 2024.moomin-1
Month:: March
Year:: 2024
Address:: St Julians, Malta
Venues:: MOOMIN | WS
SIG:
Publisher:: Association for Computational Linguistics
URL:: https://aclanthology.org/2024.moomin-1
DOI:
Bib Export formats:: BibTeX MODS XML EndNote
PDF:: https://aclanthology.org/2024.moomin-1.pdf

pdf bib
Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024)
Raúl Vázquez | Timothee Mickus | Jörg Tiedemann | Ivan Vulić | Ahmet Üstün

pdf bib abs
Toward the Modular Training of Controlled Paraphrase Adapters
Teemu Vahtola | Mathias Creutz

Controlled paraphrase generation often focuses on a specific aspect of paraphrasing, for instance syntactically controlled paraphrase generation. However, these models face a limitation: they lack modularity. Consequently adapting them for another aspect, such as lexical variation, needs full retraining of the model each time. To enhance the flexibility in training controlled paraphrase models, our proposition involves incrementally training a modularized system for controlled paraphrase generation for English. We start by fine-tuning a pretrained language model to learn the broad task of paraphrase generation, generally emphasizing meaning preservation and surface form variation. Subsequently, we train a specialized sub-task adapter with limited sub-task specific training data. We can then leverage this adapter in guiding the paraphrase generation process toward a desired output aligning with the distinctive features within the sub-task training data. The preliminary results on comparing the fine-tuned and adapted model against various competing systems indicates that the most successful method for mastering both general paraphrasing skills and task-specific expertise follows a two-stage approach. This approach involves starting with the initial fine-tuning of a generic paraphrase model and subsequently tailoring it for the specific sub-task.

Soft Prompt Tuning (SPT) is a parameter-efficient method for adapting pre-trained language models (PLMs) to specific tasks by inserting learnable embeddings, or soft prompts, at the input layer of the PLM, without modifying its parameters. This paper investigates the potential of SPT for cross-lingual transfer. Unlike previous studies on SPT for cross-lingual transfer that often fine-tune both the soft prompt and the model parameters, we adhere to the original intent of SPT by keeping the model parameters frozen and only training the soft prompt. This does not only reduce the computational cost and storage overhead of full-model fine-tuning, but we also demonstrate that this very parameter efficiency intrinsic to SPT can enhance cross-lingual transfer performance to linguistically distant languages. Moreover, we explore how different factors related to the prompt, such as the length or its reparameterization, affect cross-lingual transfer performance.

pdf bib abs
Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect
Jannis Vamvas | Noëmi Aepli | Rico Sennrich

Creating neural text encoders for written Swiss German is challenging due to a dearth of training data combined with dialectal variation. In this paper, we build on several existing multilingual encoders and adapt them to Swiss German using continued pre-training. Evaluation on three diverse downstream tasks shows that simply adding a Swiss German adapter to a modular encoder achieves 97.5% of fully monolithic adaptation performance. We further find that for the task of retrieving Swiss German sentences given Standard German queries, adapting a character-level model is more effective than the other adaptation strategies. We release our code and the models trained for our experiments.

pdf bib abs
The Impact of Language Adapters in Cross-Lingual Transfer for NLU
Jenny Kunz | Oskar Holmström

Modular deep learning has been proposed for the efficient adaption of pre-trained models to new tasks, domains and languages. In particular, combining language adapters with task adapters has shown potential where no supervised data exists for a language. In this paper, we explore the role of language adapters in zero-shot cross-lingual transfer for natural language understanding (NLU) benchmarks. We study the effect of including a target-language adapter in detailed ablation studies with two multilingual models and three multilingual datasets. Our results show that the effect of target-language adapters is highly inconsistent across tasks, languages and models. Retaining the source-language adapter instead often leads to an equivalent, and sometimes to a better, performance. Removing the language adapter after training has only a weak negative effect, indicating that the language adapters do not have a strong impact on the predictions.

pdf bib abs
Mixing and Matching: Combining Independently Trained Translation Model Components
Taido Purason | Andre Tättar | Mark Fishel

This paper investigates how to combine encoders and decoders of different independently trained NMT models. Combining encoders/decoders is not directly possible since the intermediate representations of any two independent NMT models are different and cannot be combined without modification. To address this, firstly, a dimension adapter is added if the encoder and decoder have different embedding dimensionalities, and secondly, representation adapter layers are added to align the encoder’s representations for the decoder to process. As a proof of concept, this paper looks at many-to-Estonian translation and combines a massively multilingual encoder (NLLB) and a high-quality language-specific decoder. The paper successfully demonstrates that the sentence representations of two independent NMT models can be made compatible without changing the pre-trained components while keeping translation quality from deteriorating. Results show improvements in both translation quality and speed for many-to-one translation over the baseline multilingual model.