Guiding Zero-Shot Paraphrase Generation with Fine-Grained Control Tokens

Teemu Vahtola, Mathias Creutz, Jrg Tiedemann


Abstract
Sequence-to-sequence paraphrase generation models often struggle with the generation of diverse paraphrases. This deficiency constrains the viability of leveraging paraphrase generation in different Natural Language Processing tasks. We propose a translation-based guided paraphrase generation model that learns useful features for promoting surface form variation in generated paraphrases from cross-lingual parallel data. Our proposed method leverages multilingual neural machine translation pretraining to learn zero-shot paraphrasing. Furthermore, we incorporate dedicated prefix tokens into the training of the machine translation models to promote variation. The prefix tokens are designed to affect various linguistic features related to surface form realizations, and can be applied during inference to guide the decoding process towards a desired solution. We assess the proposed guided model on paraphrase generation in three languages, English, Finnish, and Swedish, and provide analysis on the feasibility of the prefix tokens to guided paraphrasing. Our analysis suggests that the attributes represented by the prefix tokens are useful in promoting variation, by pushing the paraphrases generated by the guided model to diverge from the input sentence while preserving semantics conveyed by the sentence well.
Anthology ID:
2023.starsem-1.29
Volume:
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Alexis Palmer, Jose Camacho-collados
Venue:
*SEM
SIG:
SIGLEX
Publisher:
Association for Computational Linguistics
Note:
Pages:
323–337
Language:
URL:
https://aclanthology.org/2023.starsem-1.29
DOI:
10.18653/v1/2023.starsem-1.29
Bibkey:
Cite (ACL):
Teemu Vahtola, Mathias Creutz, and Jrg Tiedemann. 2023. Guiding Zero-Shot Paraphrase Generation with Fine-Grained Control Tokens. In Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023), pages 323–337, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Guiding Zero-Shot Paraphrase Generation with Fine-Grained Control Tokens (Vahtola et al., *SEM 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.starsem-1.29.pdf