From Monolingual to Multilingual FAQ Assistant using Multilingual Co-training

Mayur Patidar, Surabhi Kumari, Manasi Patwardhan, Shirish Karande, Puneet Agarwal, Lovekesh Vig, Gautam Shroff


Abstract
Recent research on cross-lingual transfer show state-of-the-art results on benchmark datasets using pre-trained language representation models (PLRM) like BERT. These results are achieved with the traditional training approaches, such as Zero-shot with no data, Translate-train or Translate-test with machine translated data. In this work, we propose an approach of “Multilingual Co-training” (MCT) where we augment the expert annotated dataset in the source language (English) with the corresponding machine translations in the target languages (e.g. Arabic, Spanish) and fine-tune the PLRM jointly. We observe that the proposed approach provides consistent gains in the performance of BERT for multiple benchmark datasets (e.g. 1.0% gain on MLDocs, and 1.2% gain on XNLI over translate-train with BERT), while requiring a single model for multiple languages. We further consider a FAQ dataset where the available English test dataset is translated by experts into Arabic and Spanish. On such a dataset, we observe an average gain of 4.9% over all other cross-lingual transfer protocols with BERT. We further observe that domain-specific joint pre-training of the PLRM using HR policy documents in English along with the machine translations in the target languages, followed by the joint finetuning, provides a further improvement of 2.8% in average accuracy.
Anthology ID:
D19-6113
Volume:
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Month:
November
Year:
2019
Address:
Hong Kong, China
Editors:
Colin Cherry, Greg Durrett, George Foster, Reza Haffari, Shahram Khadivi, Nanyun Peng, Xiang Ren, Swabha Swayamdipta
Venue:
WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
115–123
Language:
URL:
https://aclanthology.org/D19-6113
DOI:
10.18653/v1/D19-6113
Bibkey:
Cite (ACL):
Mayur Patidar, Surabhi Kumari, Manasi Patwardhan, Shirish Karande, Puneet Agarwal, Lovekesh Vig, and Gautam Shroff. 2019. From Monolingual to Multilingual FAQ Assistant using Multilingual Co-training. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), pages 115–123, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):
From Monolingual to Multilingual FAQ Assistant using Multilingual Co-training (Patidar et al., 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-6113.pdf