Johannes Eichstaedt


2024

pdf bib
Using Daily Language to Understand Drinking: Multi-Level Longitudinal Differential Language Analysis
Matthew Matero | Huy Vu | August Nilsson | Syeda Mahwish | Young Min Cho | James McKay | Johannes Eichstaedt | Richard Rosenthal | Lyle Ungar | H. Andrew Schwartz
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)

Analyses for linking language with psychological factors or behaviors predominately treat linguistic features as a static set, working with a single document per person or aggregating across multiple posts (e.g. on social media) into a single set of features. This limits language to mostly shed light on between-person differences rather than changes in behavior within-person. Here, we collected a novel dataset of daily surveys where participants were asked to describe their experienced well-being and report the number of alcoholic beverages they had within the past 24 hours. Through this data, we first build a multi-level forecasting model that is able to capture within-person change and leverage both the psychological features of the person and daily well-being responses. Then, we propose a longitudinal version of differential language analysis that finds patterns associated with drinking more (e.g. social events) and less (e.g. task-oriented), as well as distinguishing patterns of heavy drinks versus light drinkers.

2023

pdf bib
Discourse-Level Representations can Improve Prediction of Degree of Anxiety
Swanie Juhng | Matthew Matero | Vasudha Varadarajan | Johannes Eichstaedt | Adithya V Ganesan | H. Andrew Schwartz
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Anxiety disorders are the most common of mental illnesses, but relatively little is known about how to detect them from language. The primary clinical manifestation of anxiety is worry associated cognitive distortions, which are likely expressed at the discourse-level of semantics. Here, we investigate the development of a modern linguistic assessment for degree of anxiety, specifically evaluating the utility of discourse-level information in addition to lexical-level large language model embeddings. We find that a combined lexico-discourse model outperforms models based solely on state-of-the-art contextual embeddings (RoBERTa), with discourse-level representations derived from Sentence-BERT and DiscRE both providing additional predictive power not captured by lexical-level representations. Interpreting the model, we find that discourse patterns of causal explanations, among others, were used significantly more by those scoring high in anxiety, dovetailing with psychological literature.

2022

pdf bib
WWBP-SQT-lite: Multi-level Models and Difference Embeddings for Moments of Change Identification in Mental Health Forums
Adithya V Ganesan | Vasudha Varadarajan | Juhi Mittal | Shashanka Subrahmanya | Matthew Matero | Nikita Soni | Sharath Chandra Guntuku | Johannes Eichstaedt | H. Andrew Schwartz
Proceedings of the Eighth Workshop on Computational Linguistics and Clinical Psychology

Psychological states unfold dynamically; to understand and measure mental health at scale we need to detect and measure these changes from sequences of online posts. We evaluate two approaches to capturing psychological changes in text: the first relies on computing the difference between the embedding of a message with the one that precedes it, the second relies on a “human-aware” multi-level recurrent transformer (HaRT). The mood changes of timeline posts of users were annotated into three classes, ‘ordinary,’ ‘switching’ (positive to negative or vice versa) and ‘escalations’ (increasing in intensity). For classifying these mood changes, the difference-between-embeddings technique – applied to RoBERTa embeddings – showed the highest overall F1 score (0.61) across the three different classes on the test set. The technique particularly outperformed the HaRT transformer (and other baselines) in the detection of switches (F1 = .33) and escalations (F1 = .61).Consistent with the literature, the language use patterns associated with mental-health related constructs in prior work (including depression, stress, anger and anxiety) predicted both mood switches and escalations.

2020

pdf bib
Explaining the Trump Gap in Social Distancing Using COVID Discourse
Austin Van Loon | Sheridan Stewart | Brandon Waldon | Shrinidhi K Lakshmikanth | Ishan Shah | Sharath Chandra Guntuku | Garrick Sherman | James Zou | Johannes Eichstaedt
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

Our ability to limit the future spread of COVID-19 will in part depend on our understanding of the psychological and sociological processes that lead people to follow or reject coronavirus health behaviors. We argue that the virus has taken on heterogeneous meanings in communities across the United States and that these disparate meanings shaped communities’ response to the virus during the early, vital stages of the outbreak in the U.S. Using word embeddings, we demonstrate that counties where residents socially distanced less on average (as measured by residential mobility) more semantically associated the virus in their COVID discourse with concepts of fraud, the political left, and more benign illnesses like the flu. We also show that the different meanings the virus took on in different communities explains a substantial fraction of what we call the “”Trump Gap”, or the empirical tendency for more Trump-supporting counties to socially distance less. This work demonstrates that community-level processes of meaning-making in part determined behavioral responses to the COVID-19 pandemic and that these processes can be measured unobtrusively using Twitter.

pdf bib
Detecting Emerging Symptoms of COVID-19 using Context-based Twitter Embeddings
Roshan Santosh | H. Andrew Schwartz | Johannes Eichstaedt | Lyle Ungar | Sharath Chandra Guntuku
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

In this paper, we present an iterative graph-based approach for the detection of symptoms of COVID-19, the pathology of which seems to be evolving. More generally, the method can be applied to finding context-specific words and texts (e.g. symptom mentions) in large imbalanced corpora (e.g. all tweets mentioning #COVID-19). Given the novelty of COVID-19, we also test if the proposed approach generalizes to the problem of detecting Adverse Drug Reaction (ADR). We find that the approach applied to Twitter data can detect symptom mentions substantially before to their being reported by the Centers for Disease Control (CDC).

pdf bib
Understanding Weekly COVID-19 Concerns through Dynamic Content-Specific LDA Topic Modeling
Mohammadzaman Zamani | H. Andrew Schwartz | Johannes Eichstaedt | Sharath Chandra Guntuku | Adithya Virinchipuram Ganesan | Sean Clouston | Salvatore Giorgi
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

The novelty and global scale of the COVID-19 pandemic has lead to rapid societal changes in a short span of time. As government policy and health measures shift, public perceptions and concerns also change, an evolution documented within discourse on social media. We propose a dynamic content-specific LDA topic modeling technique that can help to identify different domains of COVID-specific discourse that can be used to track societal shifts in concerns or views. Our experiments show that these model-derived topics are more coherent than standard LDA topics, and also provide new features that are more helpful in prediction of COVID-19 related outcomes including social mobility and unemployment rate.

2017

pdf bib
DLATK: Differential Language Analysis ToolKit
H. Andrew Schwartz | Salvatore Giorgi | Maarten Sap | Patrick Crutchley | Lyle Ungar | Johannes Eichstaedt
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present Differential Language Analysis Toolkit (DLATK), an open-source python package and command-line tool developed for conducting social-scientific language analyses. While DLATK provides standard NLP pipeline steps such as tokenization or SVM-classification, its novel strengths lie in analyses useful for psychological, health, and social science: (1) incorporation of extra-linguistic structured information, (2) specified levels and units of analysis (e.g. document, user, community), (3) statistical metrics for continuous outcomes, and (4) robust, proven, and accurate pipelines for social-scientific prediction problems. DLATK integrates multiple popular packages (SKLearn, Mallet), enables interactive usage (Jupyter Notebooks), and generally follows object oriented principles to make it easy to tie in additional libraries or storage technologies.

2016

pdf bib
Modelling Valence and Arousal in Facebook posts
Daniel Preoţiuc-Pietro | H. Andrew Schwartz | Gregory Park | Johannes Eichstaedt | Margaret Kern | Lyle Ungar | Elisabeth Shulman
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Does ‘well-being’ translate on Twitter?
Laura Smith | Salvatore Giorgi | Rishi Solanki | Johannes Eichstaedt | H. Andrew Schwartz | Muhammad Abdul-Mageed | Anneke Buffone | Lyle Ungar
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2015

pdf bib
Extracting Human Temporal Orientation from Facebook Language
H. Andrew Schwartz | Gregory Park | Maarten Sap | Evan Weingarten | Johannes Eichstaedt | Margaret Kern | David Stillwell | Michal Kosinski | Jonah Berger | Martin Seligman | Lyle Ungar
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
The role of personality, age, and gender in tweeting about mental illness
Daniel Preoţiuc-Pietro | Johannes Eichstaedt | Gregory Park | Maarten Sap | Laura Smith | Victoria Tobolsky | H. Andrew Schwartz | Lyle Ungar
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

2014

pdf bib
Towards Assessing Changes in Degree of Depression through Facebook
H. Andrew Schwartz | Johannes Eichstaedt | Margaret L. Kern | Gregory Park | Maarten Sap | David Stillwell | Michal Kosinski | Lyle Ungar
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality

pdf bib
Developing Age and Gender Predictive Lexica over Social Media
Maarten Sap | Gregory Park | Johannes Eichstaedt | Margaret Kern | David Stillwell | Michal Kosinski | Lyle Ungar | Hansen Andrew Schwartz
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Choosing the Right Words: Characterizing and Reducing Error of the Word Count Approach
Hansen Andrew Schwartz | Johannes Eichstaedt | Eduardo Blanco | Lukasz Dziurzynski | Margaret L. Kern | Stephanie Ramones | Martin Seligman | Lyle Ungar
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity