Aliaksei Severyn


2024

pdf bib
Small Language Models Improve Giants by Rewriting Their Outputs
Giorgos Vernikos | Arthur Brazinskas | Jakub Adamek | Jonathan Mallinson | Aliaksei Severyn | Eric Malmi
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite the impressive performance of large language models (LLMs), theyoften lag behind specialized models in various tasks. LLMs only use a fractionof the existing training data for in-context learning, while task-specificmodels harness the full dataset for fine-tuning. In this work, we tackle theproblem of leveraging training data to improve the performance of LLMs withoutfine-tuning. Our approach directly targets LLM predictions without requiringaccess to their weights. We create a pool of candidates from the LLM throughfew-shot prompting and we employ a compact model, the LM-corrector (LMCor),specifically trained to merge these candidates to produce an enhanced output.Our experiments on four natural language generation tasks demonstrate that evena small LMCor model (250M) substantially improves the few-shot performance ofLLMs (62B), matching and even outperforming standard fine-tuning. Furthermore,we illustrate the robustness of LMCor against different prompts, therebyminimizing the need for extensive prompt engineering. Finally, we show thatLMCor can be seamlessly integrated with different LLMs at inference, serving asa plug-and-play module to improve their performance.

2023

pdf bib
Teaching Small Language Models to Reason
Lucie Charlotte Magister | Jonathan Mallinson | Jakub Adamek | Eric Malmi | Aliaksei Severyn
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Chain of thought prompting successfully improves the reasoning capabilities of large language models, achieving state of the art results on a range of datasets. However, these reasoning capabilities only appear to emerge in models with at least tens of billions of parameters. In this paper, we explore the transfer of such reasoning capabilities to smaller models via knowledge distillation, also investigating model and dataset size trade-off. Specifically, we finetune a student model on the chain of thought outputs generated by a larger teacher model. Our experiments show that the proposed method improves task performance across arithmetic, commonsense and symbolic reasoning datasets. For example, the accuracy of T5 XXL on GSM8K improves from 8.11% to 21.99% and 18.42% when finetuned on PaLM 540B and GPT-3 175B generated chains of thought, respectively.

2022

pdf bib
EdiT5: Semi-Autoregressive Text Editing with T5 Warm-Start
Jonathan Mallinson | Jakub Adamek | Eric Malmi | Aliaksei Severyn
Findings of the Association for Computational Linguistics: EMNLP 2022

We present EdiT5 - a novel semi-autoregressive text-editing approach designed to combine the strengths of non-autoregressive text-editing and autoregressive decoding. EdiT5 is faster at inference times than conventional sequence-to-sequence (seq2seq) models, while being capable of modeling flexible input-output transformations. This is achieved by decomposing the generation process into three sub-tasks: (1) tagging to decide on the subset of input tokens to be preserved in the output, (2) re-ordering to define their order in the output text, and (3) insertion to infill the missing tokens that are not present in the input. The tagging and re-ordering steps, which are responsible for generating the largest portion of the output, are non-autoregressive, while the insertion uses an autoregressive decoder. Depending on the task, EdiT5 requires significantly fewer autoregressive steps demonstrating speedups of up to 25x when compared to classic seq2seq models. Quality-wise, EdiT5 is initialized with a pre-trained T5 checkpoint yielding comparable performance to T5 in high-resource settings and clearly outperforms it on low-resource settings when evaluated on three NLG tasks: Sentence Fusion, Grammatical Error Correction, and Decontextualization.

pdf bib
Text Generation with Text-Editing Models
Eric Malmi | Yue Dong | Jonathan Mallinson | Aleksandr Chuklin | Jakub Adamek | Daniil Mirylenka | Felix Stahlberg | Sebastian Krause | Shankar Kumar | Aliaksei Severyn
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorial Abstracts

Text-editing models have recently become a prominent alternative to seq2seq models for monolingual text-generation tasks such as grammatical error correction, text simplification, and style transfer. These tasks share a common trait – they exhibit a large amount of textual overlap between the source and target texts. Text-editing models take advantage of this observation and learn to generate the output by predicting edit operations applied to the source sequence. In contrast, seq2seq models generate outputs word-by-word from scratch thus making them slow at inference time. Text-editing models provide several benefits over seq2seq models including faster inference speed, higher sample efficiency, and better control and interpretability of the outputs. This tutorial provides a comprehensive overview of the text-edit based models and current state-of-the-art approaches analyzing their pros and cons. We discuss challenges related to deployment and how these models help to mitigate hallucination and bias, both pressing challenges in the field of text generation.

2021

pdf bib
A Simple Recipe for Multilingual Grammatical Error Correction
Sascha Rothe | Jonathan Mallinson | Eric Malmi | Sebastian Krause | Aliaksei Severyn
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

This paper presents a simple recipe to trainstate-of-the-art multilingual Grammatical Error Correction (GEC) models. We achieve this by first proposing a language-agnostic method to generate a large number of synthetic examples. The second ingredient is to use large-scale multilingual language models (up to 11B parameters). Once fine-tuned on language-specific supervised sets we surpass the previous state-of-the-art results on GEC benchmarks in four languages: English, Czech, German and Russian. Having established a new set of baselines for GEC, we make our results easily reproducible and accessible by releasing a CLANG-8 dataset. It is produced by using our best model, which we call gT5, to clean the targets of a widely used yet noisy Lang-8 dataset. cLang-8 greatly simplifies typical GEC training pipelines composed of multiple fine-tuning stages – we demonstrate that performing a single fine-tuning stepon cLang-8 with the off-the-shelf language models yields further accuracy improvements over an already top-performing gT5 model for English.

2020

pdf bib
FELIX: Flexible Text Editing Through Tagging and Insertion
Jonathan Mallinson | Aliaksei Severyn | Eric Malmi | Guillermo Garrido
Findings of the Association for Computational Linguistics: EMNLP 2020

We present FELIX – a flexible text-editing approach for generation, designed to derive maximum benefit from the ideas of decoding with bi-directional contexts and self-supervised pretraining. In contrast to conventional sequenceto-sequence (seq2seq) models, FELIX is efficient in low-resource settings and fast at inference time, while being capable of modeling flexible input-output transformations. We achieve this by decomposing the text-editing task into two sub-tasks: tagging to decide on the subset of input tokens and their order in the output text and insertion to in-fill the missing tokens in the output not present in the input. The tagging model employs a novel Pointer mechanism, while the insertion model is based on a Masked Language Model (MLM). Both of these models are chosen to be non-autoregressive to guarantee faster inference. FELIX performs favourably when compared to recent text-editing methods and strong seq2seq baselines when evaluated on four NLG tasks: Sentence Fusion, Machine Translation Automatic Post-Editing, Summarization, and Text Simplification

pdf bib
Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
Sascha Rothe | Shashi Narayan | Aliaksei Severyn
Transactions of the Association for Computational Linguistics, Volume 8

Unsupervised pre-training of large neural models has recently revolutionized Natural Language Processing. By warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, GPT-2, and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation, Text Summarization, Sentence Splitting, and Sentence Fusion.

pdf bib
Unsupervised Text Style Transfer with Padded Masked Language Models
Eric Malmi | Aliaksei Severyn | Sascha Rothe
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose Masker, an unsupervised text-editing method for style transfer. To tackle cases when no parallel source–target pairs are available, we train masked language models (MLMs) for both the source and the target domain. Then we find the text spans where the two models disagree the most in terms of likelihood. This allows us to identify the source tokens to delete to transform the source text to match the style of the target domain. The deleted tokens are replaced with the target MLM, and by using a padded MLM variant, we avoid having to predetermine the number of inserted tokens. Our experiments on sentence fusion and sentiment transfer demonstrate that Masker performs competitively in a fully unsupervised setting. Moreover, in low-resource settings, it improves supervised methods’ accuracy by over 10 percentage points when pre-training them on silver training data generated by Masker.

2019

pdf bib
Encode, Tag, Realize: High-Precision Text Editing
Eric Malmi | Sebastian Krause | Sascha Rothe | Daniil Mirylenka | Aliaksei Severyn
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We propose LaserTagger - a sequence tagging approach that casts text generation as a text editing task. Target texts are reconstructed from the inputs using three main edit operations: keeping a token, deleting it, and adding a phrase before the token. To predict the edit operations, we propose a novel model, which combines a BERT encoder with an autoregressive Transformer decoder. This approach is evaluated on English text on four tasks: sentence fusion, sentence splitting, abstractive summarization, and grammar correction. LaserTagger achieves new state-of-the-art results on three of these tasks, performs comparably to a set of strong seq2seq baselines with a large number of training examples, and outperforms them when the number of examples is limited. Furthermore, we show that at inference time tagging can be more than two orders of magnitude faster than comparable seq2seq models, making it more attractive for running in a live environment.

2017

pdf bib
A Hybrid Convolutional Variational Autoencoder for Text Generation
Stanislau Semeniuta | Aliaksei Severyn | Erhardt Barth
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper we explore the effect of architectural choices on learning a variational autoencoder (VAE) for text generation. In contrast to the previously introduced VAE model for text where both the encoder and decoder are RNNs, we propose a novel hybrid architecture that blends fully feed-forward convolutional and deconvolutional components with a recurrent language model. Our architecture exhibits several attractive properties such as faster run time and convergence, ability to better handle long sequences and, more importantly, it helps to avoid the issue of the VAE collapsing to a deterministic model.

pdf bib
RelTextRank: An Open Source Framework for Building Relational Syntactic-Semantic Text Pair Representations
Kateryna Tymoshenko | Alessandro Moschitti | Massimo Nicosia | Aliaksei Severyn
Proceedings of ACL 2017, System Demonstrations

2016

pdf bib
Globally Normalized Transition-Based Neural Networks
Daniel Andor | Chris Alberti | David Weiss | Aliaksei Severyn | Alessandro Presta | Kuzman Ganchev | Slav Petrov | Michael Collins
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Recurrent Dropout without Memory Loss
Stanislau Semeniuta | Aliaksei Severyn | Erhardt Barth
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper presents a novel approach to recurrent neural network (RNN) regularization. Differently from the widely adopted dropout method, which is applied to forward connections of feedforward architectures or RNNs, we propose to drop neurons directly in recurrent connections in a way that does not cause loss of long-term memory. Our approach is as easy to implement and apply as the regular feed-forward dropout and we demonstrate its effectiveness for the most effective modern recurrent network – Long Short-Term Memory network. Our experiments on three NLP benchmarks show consistent improvements even when combined with conventional feed-forward dropout.

2015

pdf bib
UNITN: Training Deep Convolutional Neural Network for Twitter Sentiment Classification
Aliaksei Severyn | Alessandro Moschitti
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

pdf bib
On the Automatic Learning of Sentiment Lexicons
Aliaksei Severyn | Alessandro Moschitti
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Distributional Neural Networks for Automatic Resolution of Crossword Puzzles
Aliaksei Severyn | Massimo Nicosia | Gianni Barlacchi | Alessandro Moschitti
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Encoding Semantic Resources in Syntactic Structures for Passage Reranking
Kateryna Tymoshenko | Alessandro Moschitti | Aliaksei Severyn
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Opinion Mining on YouTube
Aliaksei Severyn | Alessandro Moschitti | Olga Uryupina | Barbara Plank | Katja Filippova
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
SenTube: A Corpus for Sentiment Analysis on YouTube Social Media
Olga Uryupina | Barbara Plank | Aliaksei Severyn | Agata Rotondi | Alessandro Moschitti
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present SenTube – a dataset of user-generated comments on YouTube videos annotated for information content and sentiment polarity. It contains annotations that allow to develop classifiers for several important NLP tasks: (i) sentiment analysis, (ii) text categorization (relatedness of a comment to video and/or product), (iii) spam detection, and (iv) prediction of comment informativeness. The SenTube corpus favors the development of research on indexing and searching YouTube videos exploiting information derived from comments. The corpus will cover several languages: at the moment, we focus on English and Italian, with Spanish and Dutch parts scheduled for the later stages of the project. For all the languages, we collect videos for the same set of products, thus offering possibilities for multi- and cross-lingual experiments. The paper provides annotation guidelines, corpus statistics and annotator agreement details.

2013

pdf bib
iKernels-Core: Tree Kernel Learning for Textual Similarity
Aliaksei Severyn | Massimo Nicosia | Alessandro Moschitti
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

pdf bib
Learning Semantic Textual Similarity with Structural Representations
Aliaksei Severyn | Massimo Nicosia | Alessandro Moschitti
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Learning Adaptable Patterns for Passage Reranking
Aliaksei Severyn | Massimo Nicosia | Alessandro Moschitti
Proceedings of the Seventeenth Conference on Computational Natural Language Learning

pdf bib
Automatic Feature Engineering for Answer Selection and Extraction
Aliaksei Severyn | Alessandro Moschitti
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing