Manfred Stede


2024

pdf bib
Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)
Sophie Henning | Manfred Stede
Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)

2023

pdf bib
Discourse Sense Flows: Modelling the Rhetorical Style of Documents across Various Domains
Rene Knaebel | Manfred Stede
Findings of the Association for Computational Linguistics: EMNLP 2023

Recent research on shallow discourse parsing has given renewed attention to the role of discourse relation signals, in particular explicit connectives and so-called alternative lexicalizations. In our work, we first develop new models for extracting signals and classifying their senses, both for explicit connectives and alternative lexicalizations, based on the Penn Discourse Treebank v3 corpus. Thereafter, we apply these models to various raw corpora, and we introduce ‘discourse sense flows’, a new way of modeling the rhetorical style of a document by the linear order of coherence relations, as captured by the PDTB senses. The corpora span several genres and domains, and we undertake comparative analyses of the sense flows, as well as experiments on automatic genre/domain discrimination using discourse sense flow patterns as features. We find that n-gram patterns are indeed stronger predictors than simple sense (unigram) distributions.

pdf bib
Encoding Discourse Structure: Comparison of RST and QUD
Sara Shahmohammadi | Hannah Seemann | Manfred Stede | Tatjana Scheffler
Proceedings of the 4th Workshop on Computational Approaches to Discourse (CODI 2023)

We present a quantitative and qualitative comparison of the discourse trees defined by the Rhetorical Structure Theory and Questions under Discussion models. Based on an empirical analysis of parallel annotations for 28 texts (blog posts and podcast transcripts), we conclude that both discourse frameworks capture similar structural information. The qualitative analysis shows that while complex discourse units often match between analyses, QUD structures do not indicate the centrality of segments.

pdf bib
The UNSC-Graph: An Extensible Knowledge Graph for the UNSC Corpus
Stian Rødven-Eide | Karolina Zaczynska | Antonio Pires | Ronny Patz | Manfred Stede
Proceedings of the 3rd Workshop on Computational Linguistics for the Political and Social Sciences

pdf bib
Towards Fine-Grained Argumentation Strategy Analysis in Persuasive Essays
Robin Schaefer | René Knaebel | Manfred Stede
Proceedings of the 10th Workshop on Argument Mining

We define an argumentation strategy as the set of rhetorical and stylistic means that authors employ to produce an effective, and often persuasive, text. First computational accounts of such strategies have been relatively coarse-grained, while in our work we aim to move to a more detailed analysis. We extend the annotations of the Argument Annotated Essays corpus (Stab and Gurevych, 2017) with specific types of claims and premises, propose a model for their automatic identification and show first results, and then we discuss usage patterns that emerge with respect to the essay structure, the “flows” of argument component types, the claim-premise constellations, the role of the essay prompt type, and that of the individual author.

pdf bib
Communicating Climate Change: A Comparison Between Tweets and Speeches by German Members of Parliament
Robin Schaefer | Christoph Abels | Stephan Lewandowsky | Manfred Stede
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Twitter and parliamentary speeches are very different communication channels, but many members of parliament (MPs) make use of both. Focusing on the topic of climate change, we undertake a comparative analysis of speeches and tweets uttered by MPs in Germany in a recent six-year period. By keyword/hashtag analyses and topic modeling, we find substantial differences along party lines, with left-leaning parties discussing climate change through a crisis frame, while liberal and conservative parties try to address climate change through the lens of climate-friendly technology and practices. Only the AfD denies the need to adopt climate change mitigating measures, demeaning those concerned about a deteriorating climate as climate cult or fanatics. Our analysis reveals that climate change communication does not differ substantially between Twitter and parliamentary speeches, but across the political spectrum.

2022

pdf bib
On Selecting Training Corpora for Cross-Domain Claim Detection
Robin Schaefer | René Knaebel | Manfred Stede
Proceedings of the 9th Workshop on Argument Mining

Identifying claims in text is a crucial first step in argument mining. In this paper, we investigate factors for the composition of training corpora to improve cross-domain claim detection. To this end, we use four recent argumentation corpora annotated with claims and submit them to several experimental scenarios. Our results indicate that the “ideal” composition of training corpora is characterized by a large corpus size, homogeneous claim proportions, and less formal text domains.

pdf bib
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)
Robin Schaefer | Xiaoyu Bai | Manfred Stede | Torsten Zesch
Proceedings of the 18th Conference on Natural Language Processing (KONVENS 2022)

pdf bib
Argument Similarity Assessment in German for Intelligent Tutoring: Crowdsourced Dataset and First Experiments
Xiaoyu Bai | Manfred Stede
Proceedings of the Thirteenth Language Resources and Evaluation Conference

NLP technologies such as text similarity assessment, question answering and text classification are increasingly being used to develop intelligent educational applications. The long-term goal of our work is an intelligent tutoring system for German secondary schools, which will support students in a school exercise that requires them to identify arguments in an argumentative source text. The present paper presents our work on a central subtask, viz. the automatic assessment of similarity between a pair of argumentative text snippets in German. In the designated use case, students write out key arguments from a given source text; the tutoring system then evaluates them against a target reference, assessing the similarity level between student work and the reference. We collect a dataset for our similarity assessment task through crowdsourcing as authentic German student data are scarce; we label the collected text pairs with similarity scores on a 5-point scale and run first experiments on the task. We see that a model based on BERT shows promising results, while we also discuss some challenges that we observe.

pdf bib
GerCCT: An Annotated Corpus for Mining Arguments in German Tweets on Climate Change
Robin Schaefer | Manfred Stede
Proceedings of the Thirteenth Language Resources and Evaluation Conference

While the field of argument mining has grown notably in the last decade, research on the Twitter medium remains relatively understudied. Given the difficulty of mining arguments in tweets, recent work on creating annotated resources mainly utilized simplified annotation schemes that focus on single argument components, i.e., on claim or evidence. In this paper we strive to fill this research gap by presenting GerCCT, a new corpus of German tweets on climate change, which was annotated for a set of different argument components and properties. Additionally, we labelled sarcasm and toxic language to facilitate the development of tools for filtering out non-argumentative content. This, to the best of our knowledge, renders our corpus the first tweet resource annotated for argumentation, sarcasm and toxic language. We show that a comparatively complex annotation scheme can still yield promising inter-annotator agreement. We further present first good supervised classification results yielded by a fine-tuned BERT architecture.

pdf bib
Extractive Summarisation for German-language Data: A Text-level Approach with Discourse Features
Freya Hewett | Manfred Stede
Proceedings of the 29th International Conference on Computational Linguistics

We examine the link between facets of Rhetorical Structure Theory (RST) and the selection of content for extractive summarisation, for German-language texts. For this purpose, we produce a set of extractive summaries for a dataset of German-language newspaper commentaries, a corpus which already has several layers of annotation. We provide an in-depth analysis of the connection between summary sentences and several RST-based features and transfer these insights to various automated summarisation models. Our results show that RST features are informative for the task of extractive summarisation, particularly nuclearity and relations at sentence-level.

pdf bib
Towards Identifying Alternative-Lexicalization Signals of Discourse Relations
René Knaebel | Manfred Stede
Proceedings of the 29th International Conference on Computational Linguistics

The task of shallow discourse parsing in the Penn Discourse Treebank (PDTB) framework has traditionally been restricted to identifying those relations that are signaled by a discourse connective (“explicit”) and those that have no signal at all (“implicit”). The third type, the more flexible group of “AltLex” realizations has been neglected because of its small amount of occurrences in the PDTB2 corpus. Their number has grown significantly in the recent PDTB3, and in this paper, we present the first approaches for recognizing these “alternative lexicalizations”. We compare the performance of a pattern-based approach and a sequence labeling model, add an experiment on the pre-classification of candidate sentences, and provide an initial qualitative analysis of the error cases made by both models.

2021

pdf bib
Proceedings of the 8th Workshop on Argument Mining
Khalid Al-Khatib | Yufang Hou | Manfred Stede
Proceedings of the 8th Workshop on Argument Mining

pdf bib
The Climate Change Debate and Natural Language Processing
Manfred Stede | Ronny Patz
Proceedings of the 1st Workshop on NLP for Positive Impact

The debate around climate change (CC)—its extent, its causes, and the necessary responses—is intense and of global importance. Yet, in the natural language processing (NLP) community, this domain has so far received little attention. In contrast, it is of enormous prominence in various social science disciplines, and some of that work follows the ”text-as-data” paradigm, seeking to employ quantitative methods for analyzing large amounts of CC-related text. Other research is qualitative in nature and studies details, nuances, actors, and motivations within CC discourses. Coming from both NLP and Political Science, and reviewing key works in both disciplines, we discuss how social science approaches to CC debates can inform advances in text-mining/NLP, and how, in return, NLP can support policy-makers and activists in making sense of large-scale and complex CC discourses across multiple genres, channels, topics, and communities. This is paramount for their ability to make rapid and meaningful impact on the discourse, and for shaping the necessary policy change.

pdf bib
Automatically evaluating the conceptual complexity of German texts
Freya Hewett | Manfred Stede
Proceedings of the 17th Conference on Natural Language Processing (KONVENS 2021)

pdf bib
UPAppliedCL at GermEval 2021: Identifying Fact-Claiming and Engaging Facebook Comments Using Transformers
Robin Schaefer | Manfred Stede
Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments

In this paper we present UPAppliedCL’s contribution to the GermEval 2021 Shared Task. In particular, we participated in Subtasks 2 (Engaging Comment Classification) and 3 (Fact-Claiming Comment Classification). While acceptable results can be obtained by using unigrams or linguistic features in combination with traditional machine learning models, we show that for both tasks transformer models trained on fine-tuned BERT embeddings yield best results.

2020

pdf bib
Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection
Henny Sluyter-Gäthje | Peter Bourgonje | Manfred Stede
Proceedings of the Twelfth Language Resources and Evaluation Conference

Shallow Discourse Parsing (SDP), the identification of coherence relations between text spans, relies on large amounts of training data, which so far exists only for English - any other language is in this respect an under-resourced one. For those languages where machine translation from English is available with reasonable quality, MT in conjunction with annotation projection can be an option for producing an SDP resource. In our study, we translate the English Penn Discourse TreeBank into German and experiment with various methods of annotation projection to arrive at the German counterpart of the PDTB. We describe the key characteristics of the corpus as well as some typical sources of errors encountered during its creation. Then we evaluate the GermanPDTB by training components for selected sub-tasks of discourse parsing on this silver data and compare performance to the same components when trained on the gold, original PDTB corpus.

pdf bib
The Potsdam Commentary Corpus 2.2: Extending Annotations for Shallow Discourse Parsing
Peter Bourgonje | Manfred Stede
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present the Potsdam Commentary Corpus 2.2, a German corpus of news editorials annotated on several different levels. New in the 2.2 version of the corpus are two additional annotation layers for coherence relations following the Penn Discourse TreeBank framework. Specifically, we add relation senses to an already existing layer of discourse connectives and their arguments, and we introduce a new layer with additional coherence relation types, resulting in a German corpus that mirrors the PDTB. The aim of this is to increase usability of the corpus for the task of shallow discourse parsing. In this paper, we provide inter-annotator agreement figures for the new annotations and compare corpus statistics based on the new annotations to the equivalent statistics extracted from the PDTB.

pdf bib
DiMLex-Bangla: A Lexicon of Bangla Discourse Connectives
Debopam Das | Manfred Stede | Soumya Sankar Ghosh | Lahari Chatterjee
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present DiMLex-Bangla, a newly developed lexicon of discourse connectives in Bangla. The lexicon, upon completion of its first version, contains 123 Bangla connective entries, which are primarily compiled from the linguistic literature and translation of English discourse connectives. The lexicon compilation is later augmented by adding more connectives from a currently developed corpus, called the Bangla RST Discourse Treebank (Das and Stede, 2018). DiMLex-Bangla provides information on syntactic categories of Bangla connectives, their discourse semantics and non-connective uses (if any). It uses the format of the German connective lexicon DiMLex (Stede and Umbach, 1998), which provides a cross-linguistically applicable XML schema. The resource is the first of its kind in Bangla, and is freely available for use in studies on discourse structure and computational applications.

pdf bib
Semi-Supervised Tri-Training for Explicit Discourse Argument Expansion
René Knaebel | Manfred Stede
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper describes a novel application of semi-supervision for shallow discourse parsing. We use a neural approach for sequence tagging and focus on the extraction of explicit discourse arguments. First, additional unlabeled data is prepared for semi-supervised learning. From this data, weak annotations are generated in a first setting and later used in another setting to study performance differences. In our studies, we show an increase in the performance of our models that ranges between 2-10% F1 score. Further, we give some insights to the generated discourse annotations and compare the developed additional relations with the training relations. We release this new dataset of explicit discourse arguments to enable the training of large statistical models.

pdf bib
Adapting Coreference Resolution to Twitter Conversations
Berfin Aktaş | Veronika Solopova | Annalena Kohnert | Manfred Stede
Findings of the Association for Computational Linguistics: EMNLP 2020

The performance of standard coreference resolution is known to drop significantly on Twitter texts. We improve the performance of the (Lee et al., 2018) system, which is originally trained on OntoNotes, by retraining on manually-annotated Twitter conversation data. Further experiments by combining different portions of OntoNotes with Twitter data show that selecting text genres for the training data can beat the mere maximization of training data amount. In addition, we inspect several phenomena such as the role of deictic pronouns in conversational data, and present additional results for variant settings. Our best configuration improves the performance of the”out of the box” system by 21.6%.

pdf bib
Contextualized Embeddings for Connective Disambiguation in Shallow Discourse Parsing
René Knaebel | Manfred Stede
Proceedings of the First Workshop on Computational Approaches to Discourse

This paper studies a novel model that simplifies the disambiguation of connectives for explicit discourse relations. We use a neural approach that integrates contextualized word embeddings and predicts whether a connective candidate is part of a discourse relation or not. We study the influence of those context-specific embeddings. Further, we show the benefit of training the tasks of connective disambiguation and sense classification together at the same time. The success of our approach is supported by state-of-the-art results.

pdf bib
Annotation and Detection of Arguments in Tweets
Robin Schaefer | Manfred Stede
Proceedings of the 7th Workshop on Argument Mining

Notwithstanding the increasing role Twitter plays in modern political and social discourse, resources built for conducting argument mining on tweets remain limited. In this paper, we present a new corpus of German tweets annotated for argument components. To the best of our knowledge, this is the first corpus containing not only annotated full tweets but also argumentative spans within tweets. We further report first promising results using supervised classification (F1: 0.82) and sequence labeling (F1: 0.72) approaches.

pdf bib
Exploiting a lexical resource for discourse connective disambiguation in German
Peter Bourgonje | Manfred Stede
Proceedings of the 28th International Conference on Computational Linguistics

In this paper we focus on connective identification and sense classification for explicit discourse relations in German, as two individual sub-tasks of the overarching Shallow Discourse Parsing task. We successively augment a purely-empirical approach based on contextualised embeddings with linguistic knowledge encoded in a connective lexicon. In this way, we improve over published results for connective identification, achieving a final F1-score of 87.93; and we introduce, to the best of our knowledge, first results for German sense classification, achieving an F1-score of 87.13. Our approach demonstrates that a connective lexicon can be a valuable resource for those languages that do not have a large PDTB-style-annotated coprus available.

pdf bib
Variation in Coreference Strategies across Genres and Production Media
Berfin Aktaş | Manfred Stede
Proceedings of the 28th International Conference on Computational Linguistics

In response to (i) inconclusive results in the literature as to the properties of coreference chains in written versus spoken language, and (ii) a general lack of work on automatic coreference resolution on both spoken language and social media, we undertake a corpus study involving the various genre sections of Ontonotes, the Switchboard corpus, and a corpus of Twitter conversations. Using a set of measures that previously have been applied individually to different data sets, we find fairly clear patterns of “behavior” for the different genres/media. Besides their role for psycholinguistic investigation (why do we employ different coreference strategies when we write or speak) and for the placement of Twitter in the spoken–written continuum, we see our results as a contribution to approaching genre-/media-specific coreference resolution.

2019

pdf bib
Automated Cross-language Intelligibility Analysis of Parkinson’s Disease Patients Using Speech Recognition Technologies
Nina Hosseini-Kivanani | Juan Camilo Vásquez-Correa | Manfred Stede | Elmar Nöth
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Speech deficits are common symptoms amongParkinson’s Disease (PD) patients. The automatic assessment of speech signals is promising for the evaluation of the neurological state and the speech quality of the patients. Recently, progress has been made in applying machine learning and computational methods to automatically evaluate the speech of PD patients. In the present study, we plan to analyze the speech signals of PD patients and healthy control (HC) subjects in three different languages: German, Spanish, and Czech, with the aim to identify biomarkers to discriminate between PD patients and HC subjects and to evaluate the neurological state of the patients. Therefore, the main contribution of this study is the automatic classification of PD patients and HC subjects in different languages with focusing on phonation, articulation, and prosody. We will focus on an intelligibility analysis based on automatic speech recognition systems trained on these three languages. This is one of the first studies done that considers the evaluation of the speech of PD patients in different languages. The purpose of this research proposal is to build a model that can discriminate PD and HC subjects even when the language used for train and test is different.

pdf bib
Window-Based Neural Tagging for Shallow Discourse Argument Labeling
René Knaebel | Manfred Stede | Sebastian Stober
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

This paper describes a novel approach for the task of end-to-end argument labeling in shallow discourse parsing. Our method describes a decomposition of the overall labeling task into subtasks and a general distance-based aggregation procedure. For learning these subtasks, we train a recurrent neural network and gradually replace existing components of our baseline by our model. The model is trained and evaluated on the Penn Discourse Treebank 2 corpus. While it is not as good as knowledge-intense approaches, it clearly outperforms other models that are also trained without additional linguistic features.

pdf bib
Annotating Shallow Discourse Relations in Twitter Conversations
Tatjana Scheffler | Berfin Aktaş | Debopam Das | Manfred Stede
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

We introduce our pilot study applying PDTB-style annotation to Twitter conversations. Lexically grounded coherence annotation for Twitter threads will enable detailed investigations of the discourse structure of conversations on social media. Here, we present our corpus of 185 threads and annotation, including an inter-annotator agreement study. We discuss our observations as to how Twitter discourses differ from written news text wrt. discourse connectives and relations. We confirm our hypothesis that discourse relations in written social media conversations are expressed differently than in (news) text. We find that in Twitter, connective arguments frequently are not full syntactic clauses, and that a few general connectives expressing EXPANSION and CONTINGENCY make up the majority of the explicit relations in our data.

pdf bib
RST-Tace A tool for automatic comparison and evaluation of RST trees
Shujun Wan | Tino Kutschbach | Anke Lüdeling | Manfred Stede
Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

This paper presents RST-Tace, a tool for automatic comparison and evaluation of RST trees. RST-Tace serves as an implementation of Iruskieta’s comparison method, which allows trees to be compared and evaluated without the influence of decisions at lower levels in a tree in terms of four factors: constituent, attachment point, nuclearity as well as relation. RST-Tace can be used regardless of the language or the size of rhetorical trees. This tool aims to measure the agreement between two annotators. The result is reflected by F-measure and inter-annotator agreement. Both the comparison table and the result of the evaluation can be obtained automatically.

pdf bib
Coherence models in schizophrenia
Sandra Just | Erik Haegert | Nora Kořánová | Anna-Lena Bröcker | Ivan Nenchev | Jakob Funcke | Christiane Montag | Manfred Stede
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

Incoherent discourse in schizophrenia has long been recognized as a dominant symptom of the mental disorder (Bleuler, 1911/1950). Recent studies have used modern sentence and word embeddings to compute coherence metrics for spontaneous speech in schizophrenia. While clinical ratings always have a subjective element, computational linguistic methodology allows quantification of speech abnormalities. Clinical and empirical knowledge from psychiatry provide the theoretical and conceptual basis for modelling. Our study is an interdisciplinary attempt at improving coherence models in schizophrenia. Speech samples were obtained from healthy controls and patients with a diagnosis of schizophrenia or schizoaffective disorder and different severity of positive formal thought disorder. Interviews were transcribed and coherence metrics derived from different embeddings. One model found higher coherence metrics for controls than patients. All other models remained non-significant. More detailed analysis of the data motivates different approaches to improving coherence models in schizophrenia, e.g. by assessing referential abnormalities.

pdf bib
The Utility of Discourse Parsing Features for Predicting Argumentation Structure
Freya Hewett | Roshan Prakash Rane | Nina Harlacher | Manfred Stede
Proceedings of the 6th Workshop on Argument Mining

Research on argumentation mining from text has frequently discussed relationships to discourse parsing, but few empirical results are available so far. One corpus that has been annotated in parallel for argumentation structure and for discourse structure (RST, SDRT) are the ‘argumentative microtexts’ (Peldszus and Stede, 2016a). While results on perusing the gold RST annotations for predicting argumentation have been published (Peldszus and Stede, 2016b), the step to automatic discourse parsing has not yet been taken. In this paper, we run various discourse parsers (RST, PDTB) on the corpus, compare their results to the gold annotations (for RST) and then assess the contribution of automatically-derived discourse features for argumentation parsing. After reproducing the state-of-the-art Evidence Graph model from Afantenos et al. (2018) for the microtexts, we find that PDTB features can indeed improve its performance.

pdf bib
Computational Argumentation Synthesis as a Language Modeling Task
Roxanne El Baff | Henning Wachsmuth | Khalid Al Khatib | Manfred Stede | Benno Stein
Proceedings of the 12th International Conference on Natural Language Generation

Synthesis approaches in computational argumentation so far are restricted to generating claim-like argument units or short summaries of debates. Ultimately, however, we expect computers to generate whole new arguments for a given stance towards some topic, backing up claims following argumentative and rhetorical considerations. In this paper, we approach such an argumentation synthesis as a language modeling task. In our language model, argumentative discourse units are the “words”, and arguments represent the “sentences”. Given a pool of units for any unseen topic-stance pair, the model selects a set of unit types according to a basic rhetorical strategy (logos vs. pathos), arranges the structure of the types based on the units’ argumentative roles, and finally “phrases” an argument by instantiating the structure with semantically coherent units from the pool. Our evaluation suggests that the model can, to some extent, mimic the human synthesis of strategy-specific arguments.

2018

pdf bib
Anaphora Resolution for Twitter Conversations: An Exploratory Study
Berfin Aktaş | Tatjana Scheffler | Manfred Stede
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference

We present a corpus study of pronominal anaphora on Twitter conversations. After outlining the specific features of this genre, with respect to reference resolution, we explain the construction of our corpus and the annotation steps. From this we derive a list of phenomena that need to be considered when performing anaphora resolution on this type of data. Finally, we test the performance of an off-the-shelf resolution system, and provide some qualitative error analysis.

pdf bib
Identifying Explicit Discourse Connectives in German
Peter Bourgonje | Manfred Stede
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

We are working on an end-to-end Shallow Discourse Parsing system for German and in this paper focus on the first subtask: the identification of explicit connectives. Starting with the feature set from an English system and a Random Forest classifier, we evaluate our approach on a (relatively small) German annotated corpus, the Potsdam Commentary Corpus. We introduce new features and experiment with including additional training data obtained through annotation projection and achieve an f-score of 83.89.

pdf bib
Constructing a Lexicon of English Discourse Connectives
Debopam Das | Tatjana Scheffler | Peter Bourgonje | Manfred Stede
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

We present a new lexicon of English discourse connectives called DiMLex-Eng, built by merging information from two annotated corpora and an additional list of relation signals from the literature. The format follows the German connective lexicon DiMLex, which provides a cross-linguistically applicable XML schema. DiMLex-Eng contains 149 English connectives, and gives information on syntactic categories, discourse semantics and non-connective uses (if any). We report on the development steps and discuss design decisions encountered in the lexicon expansion phase. The resource is freely available for use in studies of discourse structure and computational applications.

pdf bib
More or less controlled elicitation of argumentative text: Enlarging a microtext corpus via crowdsourcing
Maria Skeppstedt | Andreas Peldszus | Manfred Stede
Proceedings of the 5th Workshop on Argument Mining

We present an extension of an annotated corpus of short argumentative texts that had originally been built in a controlled text production experiment. Our extension more than doubles the size of the corpus by means of crowdsourcing. We report on the setup of this experiment and on the consequences that crowdsourcing had for assembling the data, and in particular for annotation. We labeled the argumentative structure by marking claims, premises, and relations between them, following the scheme used in the original corpus, but had to make a few modifications in response to interesting phenomena in the data. Finally, we report on an experiment with the automatic prediction of this argumentation structure: We first replicated the approach of an earlier study on the original corpus, and compare the performance to various settings involving the extension.

pdf bib
Stance-Taking in Topics Extracted from Vaccine-Related Tweets and Discussion Forum Posts
Maria Skeppstedt | Manfred Stede | Andreas Kerren
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

The occurrence of stance-taking towards vaccination was measured in documents extracted by topic modelling from two different corpora, one discussion forum corpus and one tweet corpus. For some of the topics extracted, their most closely associated documents contained a proportion of vaccine stance-taking texts that exceeded the corpus average by a large margin. These extracted document sets would, therefore, form a useful resource in a process for computer-assisted analysis of argumentation on the subject of vaccination.

pdf bib
Argumentation Synthesis following Rhetorical Strategies
Henning Wachsmuth | Manfred Stede | Roxanne El Baff | Khalid Al-Khatib | Maria Skeppstedt | Benno Stein
Proceedings of the 27th International Conference on Computational Linguistics

Persuasion is rarely achieved through a loose set of arguments alone. Rather, an effective delivery of arguments follows a rhetorical strategy, combining logical reasoning with appeals to ethics and emotion. We argue that such a strategy means to select, arrange, and phrase a set of argumentative discourse units. In this paper, we model rhetorical strategies for the computational synthesis of effective argumentation. In a study, we let 26 experts synthesize argumentative texts with different strategies for 10 topics. We find that the experts agree in the selection significantly more when following the same strategy. While the texts notably vary for different strategies, especially their arrangement remains stable. The results suggest that our model enables a strategical synthesis.

pdf bib
A Multi-layer Annotated Corpus of Argumentative Text: From Argument Schemes to Discourse Relations
Elena Musi | Manfred Stede | Leonard Kriese | Smaranda Muresan | Andrea Rocci
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Developing the Bangla RST Discourse Treebank
Debopam Das | Manfred Stede
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Lexicon of Discourse Markers for Portuguese – LDM-PT
Amália Mendes | Iria del Rio | Manfred Stede | Felix Dombek
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Multi-source annotation projection of coreference chains: assessing strategies and testing opportunities
Yulia Grishina | Manfred Stede
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)

In this paper, we examine the possibility of using annotation projection from multiple sources for automatically obtaining coreference annotations in the target language. We implement a multi-source annotation projection algorithm and apply it on an English-German-Russian parallel corpus in order to transfer coreference chains from two sources to the target side. Operating in two settings – a low-resource and a more linguistically-informed one – we show that automatic coreference transfer could benefit from combining information from multiple languages, and assess the quality of both the extraction and the linking of target coreference mentions.

pdf bib
The Good, the Bad, and the Disagreement: Complex ground truth in rhetorical structure analysis
Debopam Das | Manfred Stede | Maite Taboada
Proceedings of the 6th Workshop on Recent Advances in RST and Related Formalisms

pdf bib
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
Kristiina Jokinen | Manfred Stede | David DeVault | Annie Louis
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

pdf bib
Automatic detection of stance towards vaccination in online discussion forums
Maria Skeppstedt | Andreas Kerren | Manfred Stede
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

A classifier for automatic detection of stance towards vaccination in online forums was trained and evaluated. Debate posts from six discussion threads on the British parental website Mumsnet were manually annotated for stance ‘against’ or ‘for’ vaccination, or as ‘undecided’. A support vector machine, trained to detect the three classes, achieved a macro F-score of 0.44, while a macro F-score of 0.62 was obtained by the same type of classifier on the binary classification task of distinguishing stance ‘against’ vaccination from stance ‘for’ vaccination. These results show that vaccine stance detection in online forums is a difficult task, at least for the type of model investigated and for the relatively small training corpus that was used. Future work will therefore include an expansion of the training data and an evaluation of other types of classifiers and features.

pdf bib
Extracting word lists for domain-specific implicit opinions from corpora
Núria Bertomeu Castelló | Manfred Stede
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Long papers

2016

pdf bib
OPT: Oslo–Potsdam–Teesside. Pipelining Rules, Rankers, and Classifier Ensembles for Shallow Discourse Parsing
Stephan Oepen | Jonathon Read | Tatjana Scheffler | Uladzimir Sidarenka | Manfred Stede | Erik Velldal | Lilja Øvrelid
Proceedings of the CoNLL-16 shared task

pdf bib
Adding Semantic Relations to a Large-Coverage Connective Lexicon of German
Tatjana Scheffler | Manfred Stede
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

DiMLex is a lexicon of German connectives that can be used for various language understanding purposes. We enhanced the coverage to 275 connectives, which we regard as covering all known German discourse connectives in current use. In this paper, we consider the task of adding the semantic relations that can be expressed by each connective. After discussing different approaches to retrieving semantic information, we settle on annotating each connective with senses from the new PDTB 3.0 sense hierarchy. We describe our new implementation in the extended DiMLex, which will be available for research purposes.

pdf bib
Parallel Discourse Annotations on a Corpus of Short Texts
Manfred Stede | Stergos Afantenos | Andreas Peldszus | Nicholas Asher | Jérémy Perret
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present the first corpus of texts annotated with two alternative approaches to discourse structure, Rhetorical Structure Theory (Mann and Thompson, 1988) and Segmented Discourse Representation Theory (Asher and Lascarides, 2003). 112 short argumentative texts have been analyzed according to these two theories. Furthermore, in previous work, the same texts have already been annotated for their argumentation structure, according to the scheme of Peldszus and Stede (2013). This corpus therefore enables studies of correlations between the two accounts of discourse structure, and between discourse and argumentation. We converted the three annotation formats to a common dependency tree format that enables to compare the structures, and we describe some initial findings.

pdf bib
Information structure in the Potsdam Commentary Corpus: Topics
Manfred Stede | Sara Mamprin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The Potsdam Commentary Corpus is a collection of 175 German newspaper commentaries annotated on a variety of different layers. This paper introduces a new layer that covers the linguistic notion of information-structural topic (not to be confused with ‘topic’ as applied to documents in information retrieval). To our knowledge, this is the first larger topic-annotated resource for German (and one of the first for any language). We describe the annotation guidelines and the annotation process, and the results of an inter-annotator agreement study, which compare favourably to the related work. The annotated corpus is freely available for research.

pdf bib
Anaphoricity in Connectives: A Case Study on German
Manfred Stede | Yulia Grishina
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

pdf bib
Rhetorical structure and argumentation structure in monologue text
Andreas Peldszus | Manfred Stede
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib
Generating Sentiment Lexicons for German Twitter
Uladzimir Sidarenka | Manfred Stede
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

Despite a substantial progress made in developing new sentiment lexicon generation (SLG) methods for English, the task of transferring these approaches to other languages and domains in a sound way still remains open. In this paper, we contribute to the solution of this problem by systematically comparing semi-automatic translations of common English polarity lists with the results of the original automatic SLG algorithms, which were applied directly to German data. We evaluate these lexicons on a corpus of 7,992 manually annotated tweets. In addition to that, we also collate the results of dictionary- and corpus-based SLG methods in order to find out which of these paradigms is better suited for the inherently noisy domain of social media. Our experiments show that semi-automatic translations notably outperform automatic systems (reaching a macro-averaged F1-score of 0.589), and that dictionary-based techniques produce much better polarity lists as compared to corpus-based approaches (whose best F1-scores run up to 0.479 and 0.419 respectively) even for the non-standard Twitter genre.

pdf bib
Towards assessing depth of argumentation
Manfred Stede
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

For analyzing argumentative text, we propose to study the ‘depth’ of argumentation as one important component, which we distinguish from argument quality. In a pilot study with German newspaper commentary texts, we asked students to rate the degree of argumentativeness, and then looked for correlations with features of the annotated argumentation structure and the rhetorical structure (in terms of RST). The results indicate that the human judgements correlate with our operationalization of depth and with certain structural features of RST trees.

2015

pdf bib
Joint prediction in MST-style discourse parsing for argumentation mining
Andreas Peldszus | Manfred Stede
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Towards Detecting Counter-considerations in Text
Andreas Peldszus | Manfred Stede
Proceedings of the 2nd Workshop on Argumentation Mining

pdf bib
Knowledge-lean projection of coreference chains across languages
Yulia Grishina | Manfred Stede
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora

2014

pdf bib
Potsdam Commentary Corpus 2.0: Annotation for Discourse Research
Manfred Stede | Arne Neumann
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a revised and extended version of the Potsdam Commentary Corpus, a collection of 175 German newspaper commentaries (op-ed pieces) that has been annotated with syntax trees and three layers of discourse-level information: nominal coreference,connectives and their arguments (similar to the PDTB, Prasad et al. 2008), and trees reflecting discourse structure according to Rhetorical Structure Theory (Mann/Thompson 1988). Connectives have been annotated with the help of a semi-automatic tool, Conano (Stede/Heintze 2004), which identifies most connectives and suggests arguments based on their syntactic category. The other layers have been created manually with dedicated annotation tools. The corpus is made available on the one hand as a set of original XML files produced with the annotation tools, based on identical tokenization. On the other hand, it is distributed together with the open-source linguistic database ANNIS3 (Chiarcos et al. 2008; Zeldes et al. 2009), which provides multi-layer search functionality and layer-specific visualization modules. This allows for comfortable qualitative evaluation of the correlations between annotation layers.

pdf bib
A Model for Processing Illocutionary Structures and Argumentation in Debates
Kasia Budzynska | Mathilde Janier | Chris Reed | Patrick Saint-Dizier | Manfred Stede | Olena Yakorska
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper, we briefly present the objectives of Inference Anchoring Theory (IAT) and the formal structure which is proposed for dialogues. Then, we introduce our development corpus, and a computational model designed for the identification of discourse minimal units in the context of argumentation and the illocutionary force associated with each unit. We show the categories of resources which are needed and how they can be reused in different contexts.

pdf bib
GraPAT: a Tool for Graph Annotations
Jonathan Sonntag | Manfred Stede
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We introduce GraPAT, a web-based annotation tool for building graph structures over text. Graphs have been demonstrated to be relevant in a variety of quite diverse annotation efforts and in different NLP applications, and they serve to model annotators’ intuitions quite closely. In particular, in this paper we discuss the implementation of graph annotations for sentiment analysis, argumentation structure, and rhetorical text structures. All of these scenarios can create certain problems for existing annotation tools, and we show how GraPAT can help to overcome such difficulties.

pdf bib
Conceptual and Practical Steps in Event Coreference Analysis of Large-scale Data
Fatemeh Torabi Asr | Jonathan Sonntag | Yulia Grishina | Manfred Stede
Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation

pdf bib
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop
Lori Levin | Manfred Stede
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

2013

pdf bib
Discourse Processing
Manfred Stede
NAACL HLT 2013 Tutorial Abstracts

pdf bib
From newspaper to microblogging: What does it take to find opinions?
Wladimir Sidorenko | Jonathan Sonntag | Nina Krüger | Stefan Stieglitz | Manfred Stede
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Importing MASC into the ANNIS linguistic database: A case study of mapping GrAF
Arne Neumann | Nancy Ide | Manfred Stede
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Ranking the annotators: An agreement study on argumentation structure
Andreas Peldszus | Manfred Stede
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Towards a Tool for Interactive Concept Building for Large Scale Analysis in the Humanities
Andre Blessing | Jonathan Sonntag | Fritz Kliche | Ulrich Heid | Jonas Kuhn | Manfred Stede
Proceedings of the 7th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

2012

pdf bib
SemScribe: Natural Language Generation for Medical Reports
Sebastian Varges | Heike Bieler | Manfred Stede | Lukas C. Faulstich | Kristin Irsig | Malik Atalla
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Natural language generation in the medical domain is heavily influenced by domain knowledge and genre-specific text characteristics. We present SemScribe, an implemented natural language generation system that produces doctor's letters, in particular descriptions of cardiological findings. Texts in this domain are characterized by a high density of information and a relatively telegraphic style. Domain knowledge is encoded in a medical ontology of about 80,000 concepts. The ontology is used in particular for concept generalizations during referring expression generation. Architecturally, the system is a generation pipeline that uses a corpus-informed syntactic frame approach for realizing sentences appropriate to the domain. The system reads XML documents conforming to the HL7 Clinical Document Architecture (CDA) Standard and enhances them with generated text and references to the used data elements. We conducted a first clinical trial evaluation with medical staff and report on the findings.

2011

pdf bib
Lexicon-Based Methods for Sentiment Analysis
Maite Taboada | Julian Brooke | Milan Tofiloski | Kimberly Voll | Manfred Stede
Computational Linguistics, Volume 37, Issue 2 - June 2011

2009

pdf bib
Proceedings of the Third Linguistic Annotation Workshop (LAW III)
Manfred Stede | Chu-Ren Huang | Nancy Ide | Adam Meyers
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf bib
By all these lovely tokens... Merging Conflicting Tokenizations
Christian Chiarcos | Julia Ritz | Manfred Stede
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf bib
Genre-Based Paragraph Classification for Sentiment Analysis
Maite Taboada | Julian Brooke | Manfred Stede
Proceedings of the SIGDIAL 2009 Conference

2008

pdf bib
A Flexible Framework for Integrating Annotations from Different Tools and Tag Sets
Christian Chiarcos | Stefanie Dipper | Michael Götze | Ulf Leser | Anke Lüdeling | Julia Ritz | Manfred Stede
Traitement Automatique des Langues, Volume 49, Numéro 2 : Plate-formes pour le traitement automatique des langues [Platforms for Natural Language Processing]

pdf bib
Connective-based Local Coherence Analysis: A Lexicon for Recognizing Causal Relationships
Manfred Stede
Semantics in Text Processing. STEP 2008 Conference Proceedings

2007

pdf bib
Identifying Formal and Functional Zones in Film Reviews
Heike Bieler | Stefanie Dipper | Manfred Stede
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

pdf bib
Proceedings of the Linguistic Annotation Workshop
Branimir Boguraev | Nancy Ide | Adam Meyers | Shigeko Nariyama | Manfred Stede | Janyce Wiebe | Graham Wilcock
Proceedings of the Linguistic Annotation Workshop

pdf bib
Discourse Annotation Working Group Report
Manfred Stede | Janyce Wiebe | Eva Hajičová | Brian Reese | Simone Teufel | Bonnie Webber | Theresa Wilson
Proceedings of the Linguistic Annotation Workshop

2004

pdf bib
Machine-Assisted Rhetorical Structure Annotation
Manfred Stede | Silvan Heintze
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
The Potsdam Commentary Corpus
Manfred Stede
Proceedings of the Workshop on Discourse Annotation

pdf bib
Feeding OWL: Extracting and Representing the Content of Pathology Reports
David Schlangen | Manfred Stede | Elena Paslaru Bontas
Proceeedings of the Workshop on NLP and XML (NLPXML-2004): RDF/RDFS and OWL in Language Technology

2003

pdf bib
Surfaces and depths in text understanding: The case of newspaper commentary
Manfred Stede
Proceedings of the HLT-NAACL 2003 Workshop on Text Meaning

pdf bib
Step by step: underspecified markup in incremental rhetorical analysis
David Reitter | Manfred Stede
Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) at EACL 2003

pdf bib
Rhetorical Parsing with Underspecification and Forests
Thomas Hanneforth | Silvan Heintze | Manfred Stede
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

2002

pdf bib
Polibox: Generating Descriptions, Comparisons, and Recommendations from a Database
Manfred Stede
COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes

pdf bib
XML/XSL in the Dictionary: The Case of Discourse Markers
Daniela Berger | David Reitter | Manfred Stede
COLING-02: The 2nd Workshop on NLP and XML (NLPXML-2002)

2000

pdf bib
The hyperonym problem revisited: Conceptual and lexical hierarchies in language generation
Manfred Stede
INLG’2000 Proceedings of the First International Conference on Natural Language Generation

pdf bib
Book Reviews: Predicative Forms in Natural Language and in Lexical Knowledge Bases
Manfred Stede
Computational Linguistics, Volume 26, Number 2, June 2000

1998

pdf bib
DiMLex: A Lexicon of Discourse Markers for Text Generation and Understanding
Manfred Stede | Carla Umbach
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
DiMLex: A lexicon of discourse markers for text generation and understanding
Manfred Stede | Carla Umbach
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf bib
A Generative Perspective on Verb Alternations
Manfred Stede
Computational-Linguistics, Volume 24, Number 3, September 1998

pdf bib
Discourse Marker Choice in Sentence Planning
Brigitte Grote | Manfred Stede
Natural Language Generation

1997

pdf bib
Discourse particles and routine formulas in spoken language translation
Manfred Stede | Birte Schmitz
Spoken Language Translation

1996

pdf bib
A generative perspective on verbs and their readings
Manfred Stede
Eighth International Natural Language Generation Workshop

1994

pdf bib
TECHDOC: Multilingual generation of online and offline instructional text
Dietmar Rosner | Manfred Stede
Fourth Conference on Applied Natural Language Processing

pdf bib
Generating Multilingual Documents from a Knowledge Base The TECHDOC Project
Dietmar Rosner | Manfred Stede
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

1993

pdf bib
Lexical Choice Criteria in Language Generation
Manfred Stede
Sixth Conference of the European Chapter of the Association for Computational Linguistics

Search
Co-authors