Michael Voelske


2023

pdf bib
A New Dataset for Causality Identification in Argumentative Texts
Khalid Al Khatib | Michael Voelske | Anh Le | Shahbaz Syed | Martin Potthast | Benno Stein
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Existing datasets for causality identification in argumentative texts have several limitations, such as the type of input text (e.g., only claims), causality type (e.g., only positive), and the linguistic patterns investigated (e.g., only verb connectives). To resolve these limitations, we build the Webis-Causality-23 dataset, with sophisticated inputs (all units from arguments), a balanced distribution of causality types, and a larger number of linguistic patterns denoting causality. The dataset contains 1485 examples derived by combining the two paradigms of distant supervision and uncertainty sampling to identify diverse, high-quality samples of causality relations, and annotate them in a cost-effective manner.