Evaluating Monolingual and Crosslingual Embeddings on Datasets of Word Association Norms

Trina Kwong, Emmanuele Chersoni, Rong Xiang


Abstract
In free word association tasks, human subjects are presented with a stimulus word and are then asked to name the first word (the response word) that comes up to their mind. Those associations, presumably learned on the basis of conceptual contiguity or similarity, have attracted for a long time the attention of researchers in linguistics and cognitive psychology, since they are considered as clues about the internal organization of the lexical knowledge in the semantic memory. Word associations data have also been used to assess the performance of Vector Space Models for English, but evaluations for other languages have been relatively rare so far. In this paper, we introduce word associations datasets for Italian, Spanish and Mandarin Chinese by extracting data from the Small World of Words project, and we propose two different tasks inspired by the previous literature. We tested both monolingual and crosslingual word embeddings on the new datasets, showing that they perform similarly in the evaluation tasks.
Anthology ID:
2022.bucc-1.1
Volume:
Proceedings of the BUCC Workshop within LREC 2022
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Reinhard Rapp, Pierre Zweigenbaum, Serge Sharoff
Venue:
BUCC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
1–7
Language:
URL:
https://aclanthology.org/2022.bucc-1.1
DOI:
Bibkey:
Cite (ACL):
Trina Kwong, Emmanuele Chersoni, and Rong Xiang. 2022. Evaluating Monolingual and Crosslingual Embeddings on Datasets of Word Association Norms. In Proceedings of the BUCC Workshop within LREC 2022, pages 1–7, Marseille, France. European Language Resources Association.
Cite (Informal):
Evaluating Monolingual and Crosslingual Embeddings on Datasets of Word Association Norms (Kwong et al., BUCC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.bucc-1.1.pdf
Data
ConceptNetOpenSubtitles