Corina Koolen


2020

pdf bib
Results of a Single Blind Literary Taste Test with Short Anonymized Novel Fragments
Andreas van Cranenburgh | Corina Koolen
Proceedings of the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

It is an open question to what extent perceptions of literary quality are derived from text-intrinsic versus social factors. While supervised models can predict literary quality ratings from textual factors quite successfully, as shown in the Riddle of Literary Quality project (Koolen et al., 2020), this does not prove that social factors are not important, nor can we assume that readers make judgments on literary quality in the same way and based on the same information as machine learning models. We report the results of a pilot study to gauge the effect of textual features on literary ratings of Dutch-language novels by participants in a controlled experiment with 48 participants. In an exploratory analysis, we compare the ratings to those from the large reader survey of the Riddle in which social factors were not excluded, and to machine learning predictions of those literary ratings. We find moderate to strong correlations of questionnaire ratings with the survey ratings, but the predictions are closer to the survey ratings. Code and data: https://github.com/andreasvc/litquest

2017

pdf bib
These are not the Stereotypes You are Looking For: Bias and Fairness in Authorial Gender Attribution
Corina Koolen | Andreas van Cranenburgh
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

Stylometric and text categorization results show that author gender can be discerned in texts with relatively high accuracy. However, it is difficult to explain what gives rise to these results and there are many possible confounding factors, such as the domain, genre, and target audience of a text. More fundamentally, such classification efforts risk invoking stereotyping and essentialism. We explore this issue in two datasets of Dutch literary novels, using commonly used descriptive (LIWC, topic modeling) and predictive (machine learning) methods. Our results show the importance of controlling for variables in the corpus and we argue for taking care not to overgeneralize from the results.

2015

pdf bib
Proceedings of the Fourth Workshop on Computational Linguistics for Literature
Anna Feldman | Anna Kazantseva | Stan Szpakowicz | Corina Koolen
Proceedings of the Fourth Workshop on Computational Linguistics for Literature

pdf bib
Identifying Literary Texts with Bigrams
Andreas van Cranenburgh | Corina Koolen
Proceedings of the Fourth Workshop on Computational Linguistics for Literature

2013

pdf bib
From high heels to weed attics: a syntactic investigation of chick lit and literature
Kim Jautze | Corina Koolen | Andreas van Cranenburgh | Hayco de Jong
Proceedings of the Workshop on Computational Linguistics for Literature