Dominiek Sandra


2020

pdf bib
Orthographic Codes and the Neighborhood Effect: Lessons from Information Theory
Stéphan Tulkens | Dominiek Sandra | Walter Daelemans
Proceedings of the Twelfth Language Resources and Evaluation Conference

We consider the orthographic neighborhood effect: the effect that words with more orthographic similarity to other words are read faster. The neighborhood effect serves as an important control variable in psycholinguistic studies of word reading, and explains variance in addition to word length and word frequency. Following previous work, we model the neighborhood effect as the average distance to neighbors in feature space for three feature sets: slots, character ngrams and skipgrams. We optimize each of these feature sets and find evidence for language-independent optima, across five megastudy corpora from five alphabetic languages. Additionally, we show that weighting features using the inverse of mutual information (MI) improves the neighborhood effect significantly for all languages. We analyze the inverse feature weighting, and show that, across languages, grammatical morphemes get the lowest weights. Finally, we perform the same experiments on Korean Hangul, a non-alphabetic writing system, where we find the opposite results: slower responses as a function of denser neighborhoods, and a negative effect of inverse feature weighting. This raises the question of whether this is a cognitive effect, or an effect of the way we represent Hangul orthography, and indicates more research is needed.

2018

pdf bib
From Strings to Other Things: Linking the Neighborhood and Transposition Effects in Word Reading
Stéphan Tulkens | Dominiek Sandra | Walter Daelemans
Proceedings of the 22nd Conference on Computational Natural Language Learning

We investigate the relation between the transposition and deletion effects in word reading, i.e., the finding that readers can successfully read “SLAT” as “SALT”, or “WRK” as “WORK”, and the neighborhood effect. In particular, we investigate whether lexical orthographic neighborhoods take into account transposition and deletion in determining neighbors. If this is the case, it is more likely that the neighborhood effect takes place early during processing, and does not solely rely on similarity of internal representations. We introduce a new neighborhood measure, rd20, which can be used to quantify neighborhood effects over arbitrary feature spaces. We calculate the rd20 over large sets of words in three languages using various feature sets and show that feature sets that do not allow for transposition or deletion explain more variance in Reaction Time (RT) measurements. We also show that the rd20 can be calculated using the hidden state representations of an Multi-Layer Perceptron, and show that these explain less variance than the raw features. We conclude that the neighborhood effect is unlikely to have a perceptual basis, but is more likely to be the result of items co-activating after recognition. All code is available at: www.github.com/clips/conll2018

pdf bib
WordKit: a Python Package for Orthographic and Phonological Featurization
Stéphan Tulkens | Dominiek Sandra | Walter Daelemans
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2009

pdf bib
A Robust and Extensible Exemplar-Based Model of Thematic Fit
Bram Vandekerckhove | Dominiek Sandra | Walter Daelemans
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)