Xiaohang Tang

2023

pdf bib abs
Learning Dynamic Contextualised Word Embeddings via Template-based Temporal Adaptation
Xiaohang Tang | Yi Zhou | Danushka Bollegala
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Dynamic contextualised word embeddings (DCWEs) represent the temporal semantic variations of words. We propose a method for learning DCWEs by time-adapting a pretrained Masked Language Model (MLM) using time-sensitive templates. Given two snapshots C₁ and C₂ of a corpus taken respectively at two distinct timestamps T₁ and T₂, we first propose an unsupervised method to select (a) pivot terms related to both C₁ and C₂, and (b) anchor terms that are associated with a specific pivot term in each individual snapshot.We then generate prompts by filling manually compiled templates using the extracted pivot and anchor terms.Moreover, we propose an automatic method to learn time-sensitive templates from C₁ and C₂, without requiring any human supervision.Next, we use the generated prompts to adapt a pretrained MLM to T₂ by fine-tuning using those prompts.Multiple experiments show that our proposed method significantly reduces the perplexity of test sentences in C₂, outperforming the current state-of-the-art.

pdf bib abs
Can Word Sense Distribution Detect Semantic Changes of Words?
Xiaohang Tang | Yi Zhou | Taichi Aida | Procheta Sen | Danushka Bollegala
Findings of the Association for Computational Linguistics: EMNLP 2023

Semantic Change Detection of words is an important task for various NLP applications that must make time-sensitive predictions. Some words are used over time in novel ways to express new meanings, and these new meanings establish themselves as novel senses of existing words. On the other hand, Word Sense Disambiguation (WSD) methods associate ambiguous words with sense ids, depending on the context in which they occur. Given this relationship between WSD and SCD, we explore the possibility of predicting whether a target word has its meaning changed between two corpora collected at different time steps, by comparing the distributions of senses of that word in each corpora. For this purpose, we use pretrained static sense embeddings to automatically annotate each occurrence of the target word in a corpus with a sense id. Next, we compute the distribution of sense ids of a target word in a given corpus. Finally, we use different divergence or distance measures to quantify the semantic change of the target word across the two given corpora. Our experimental results on SemEval 2020 Task 1 dataset show that word sense distributions can be accurately used to predict semantic changes of words in English, German, Swedish and Latin.

Co-authors

Venues

acl1
findings1