Emily Öhman

Also published as: Emily Ohman


2024

pdf bib
EmotionArcs: Emotion Arcs for 9,000 Literary Texts
Emily Ohman | Yuri Bizzoni | Pascale Feldkamp Moreira | Kristoffer Nielbo
Proceedings of the 8th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2024)

We introduce EmotionArcs, a dataset comprising emotional arcs from over 9,000 English novels, assembled to understand the dynamics of emotions represented in text and how these emotions may influence a novel ́s reception and perceived quality. We evaluate emotion arcs manually, by comparing them to human annotation and against other similar emotion modeling systems to show that our system produces coherent emotion arcs that correspond to human interpretation. We present and make this resource available for further studies of a large collection of emotion arcs and present one application, exploring these arcs for modeling reader appreciation. Using information-theoretic measures to analyze the impact of emotions on literary quality, we find that emotional entropy, as well as the skewness and steepness of emotion arcs correlate with two proxies of literary reception. Our findings may offer insights into how quality assessments relate to emotional complexity and could help with the study of affect in literary novels.

2023

pdf bib
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages
Mika Hämäläinen | Emily Öhman | Flammie Pirinen | Khalid Alnajjar | So Miyagawa | Yuri Bizzoni | Niko Partanen | Jack Rueter
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages

pdf bib
The Great Digital Humanities Disconnect: The Failure of DH Publishing
Emily Öhman | Michael Piotrowski | Mika Hämäläinen
Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages

In this paper, we discuss the disconnect in interdisciplinary publishing from a disciplinary divide perspective as to how research is expected to be presented and published according to disciplinary conventions. We argue that this divide hinders interdisciplinary collaboration and even more so the dissemination of research results from interdisciplinary projects to other interdisciplinary researchers. The disconnect is not simply theoretical but also encompasses practical considerations such as manuscript creation standards. The disconnect can also be detrimental to academic careers in terms of evaluations by peers on funding and tenure committees as well as peer reviews. With this analysis, we want to foster further discussion about the state of academic publishing from a digital humanities perspective.

2022

pdf bib
Computational Exploration of the Origin of Mood in Literary Texts
Emily Öhman | Riikka H. Rossi
Proceedings of the 2nd International Workshop on Natural Language Processing for Digital Humanities

This paper is a methodological exploration of the origin of mood in early modern and modern Finnish literary texts using computational methods. We discuss the pre-processing steps as well as the various natural language processing tools used to try to pinpoint where mood can be best detected in text. We also share several tools and resources developed during this process. Our early attempts suggest that overall mood can be computationally detected in the first three paragraphs of a book.

2021

pdf bib
The Validity of Lexicon-based Sentiment Analysis in Interdisciplinary Research
Emily Öhman
Proceedings of the Workshop on Natural Language Processing for Digital Humanities

Lexicon-based sentiment and emotion analysis methods are widely used particularly in applied Natural Language Processing (NLP) projects in fields such as computational social science and digital humanities. These lexicon-based methods have often been criticized for their lack of validation and accuracy – sometimes fairly. However, in this paper, we argue that lexicon-based methods work well particularly when moving up in granularity and show how useful lexicon-based methods can be for projects where neither qualitative analysis nor a machine learning-based approach is possible. Indeed, we argue that the measure of a lexicon’s accuracy should be grounded in its usefulness.

pdf bib
Japanese Beauty Marketing on Social Media: Critical Discourse Analysis Meets NLP
Emily Öhman | Amy Gracy Metcalfe
Proceedings of the Workshop on Natural Language Processing for Digital Humanities

This project is a pilot study intending to combine traditional corpus linguistics, Natural Language Processing, critical discourse analysis, and digital humanities to gain an up-to-date understanding of how beauty is being marketed on social media, specifically Instagram, to followers. We use topic modeling combined with critical discourse analysis and NLP tools for insights into the “Japanese Beauty Myth” and show an overview of the dataset that we make publicly available.

2020

pdf bib
LT@Helsinki at SemEval-2020 Task 12: Multilingual or Language-specific BERT?
Marc Pàmies | Emily Öhman | Kaisla Kajava | Jörg Tiedemann
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents the different models submitted by the LT@Helsinki team for the SemEval 2020 Shared Task 12. Our team participated in sub-tasks A and C; titled offensive language identification and offense target identification, respectively. In both cases we used the so-called Bidirectional Encoder Representation from Transformer (BERT), a model pre-trained by Google and fine-tuned by us on the OLID and SOLID datasets. The results show that offensive tweet classification is one of several language-based tasks where BERT can achieve state-of-the-art results.

pdf bib
XED: A Multilingual Dataset for Sentiment Analysis and Emotion Detection
Emily Öhman | Marc Pàmies | Kaisla Kajava | Jörg Tiedemann
Proceedings of the 28th International Conference on Computational Linguistics

We introduce XED, a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages. We use Plutchik’s core emotions to annotate the dataset with the addition of neutral to create a multilabel multiclass dataset. The dataset is carefully evaluated using language-specific BERT models and SVMs to show that XED performs on par with other similar datasets and is therefore a useful tool for sentiment analysis and emotion detection.

2018

pdf bib
Creating a Dataset for Multilingual Fine-grained Emotion-detection Using Gamification-based Annotation
Emily Öhman | Kaisla Kajava | Jörg Tiedemann | Timo Honkela
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

This paper introduces a gamified framework for fine-grained sentiment analysis and emotion detection. We present a flexible tool, Sentimentator, that can be used for efficient annotation based on crowd sourcing and a self-perpetuating gold standard. We also present a novel dataset with multi-dimensional annotations of emotions and sentiments in movie subtitles that enables research on sentiment preservation across languages and the creation of robust multilingual emotion detection tools. The tools and datasets are public and open-source and can easily be extended and applied for various purposes.

2016

pdf bib
The Challenges of Multi-dimensional Sentiment Analysis Across Languages
Emily Öhman | Timo Honkela | Jörg Tiedemann
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

This paper outlines a pilot study on multi-dimensional and multilingual sentiment analysis of social media content. We use parallel corpora of movie subtitles as a proxy for colloquial language in social media channels and a multilingual emotion lexicon for fine-grained sentiment analyses. Parallel data sets make it possible to study the preservation of sentiments and emotions in translation and our assessment reveals that the lexical approach shows great inter-language agreement. However, our manual evaluation also suggests that the use of purely lexical methods is limited and further studies are necessary to pinpoint the cross-lingual differences and to develop better sentiment classifiers.