Building a dual dataset of text- and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic)

David M. Howcroft, William Lamb, Anna Groundwater, Dimitra Gkatzia


Abstract
Gàidhlig (Scottish Gaelic; gd) is spoken by about 57k people in Scotland, but remains an under-resourced language with respect to natural language processing in general and natural language generation (NLG) in particular. To address this gap, we developed the first datasets for Scottish Gaelic NLG, collecting both conversational and summarisation data in a single setting. Our task setup involves dialogues between a pair of speakers discussing museum exhibits, grounding the conversation in images and texts. Then, both interlocutors summarise the dialogue resulting in a secondary dialogue summarisation dataset. This paper presents the dialogue and summarisation corpora, as well as the software used for data collection. The corpus consists of 43 conversations (13.7k words) and 61 summaries (2.0k words), and will be released along with the data collection interface.
Anthology ID:
2023.inlg-main.34
Volume:
Proceedings of the 16th International Natural Language Generation Conference
Month:
September
Year:
2023
Address:
Prague, Czechia
Editors:
C. Maria Keet, Hung-Yi Lee, Sina Zarrieß
Venues:
INLG | SIGDIAL
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
443–448
Language:
URL:
https://aclanthology.org/2023.inlg-main.34
DOI:
10.18653/v1/2023.inlg-main.34
Bibkey:
Cite (ACL):
David M. Howcroft, William Lamb, Anna Groundwater, and Dimitra Gkatzia. 2023. Building a dual dataset of text- and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic). In Proceedings of the 16th International Natural Language Generation Conference, pages 443–448, Prague, Czechia. Association for Computational Linguistics.
Cite (Informal):
Building a dual dataset of text- and image-grounded conversations and summarisation in Gàidhlig (Scottish Gaelic) (Howcroft et al., INLG-SIGDIAL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.inlg-main.34.pdf
Supplementary attachment:
 2023.inlg-main.34.Supplementary_Attachment.pdf