Kaiping Peng


2023

pdf bib
XDailyDialog: A Multilingual Parallel Dialogue Corpus
Zeming Liu | Ping Nie | Jie Cai | Haifeng Wang | Zheng-Yu Niu | Peng Zhang | Mrinmaya Sachan | Kaiping Peng
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

High-quality datasets are significant to the development of dialogue models. However, most existing datasets for open-domain dialogue modeling are limited to a single language. The absence of multilingual open-domain dialog datasets not only limits the research on multilingual or cross-lingual transfer learning, but also hinders the development of robust open-domain dialog systems that can be deployed in other parts of the world. In this paper, we provide a multilingual parallel open-domain dialog dataset, XDailyDialog, to enable researchers to explore the challenging task of multilingual and cross-lingual open-domain dialog. XDailyDialog includes 13K dialogues aligned across 4 languages (52K dialogues and 410K utterances in total). We then propose a dialog generation model, kNN-Chat, which has a novel kNN-search mechanism to support unified response retrieval for monolingual, multilingual, and cross-lingual dialogue. Experiment results show the effectiveness of this framework. We will make XDailyDialog and kNN-Chat publicly available soon.