Anna Schmidt


2018

pdf bib
Inducing a Lexicon of Abusive Words – a Feature-Based Approach
Michael Wiegand | Josef Ruppenhofer | Anna Schmidt | Clayton Greenberg
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

We address the detection of abusive words. The task is to identify such words among a set of negative polar expressions. We propose novel features employing information from both corpora and lexical resources. These features are calibrated on a small manually annotated base lexicon which we use to produce a large lexicon. We show that the word-level information we learn cannot be equally derived from a large dataset of annotated microposts. We demonstrate the effectiveness of our (domain-independent) lexicon in the cross-domain detection of abusive microposts.

2017

pdf bib
A Survey on Hate Speech Detection using Natural Language Processing
Anna Schmidt | Michael Wiegand
Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media

This paper presents a survey on hate speech detection. Given the steadily growing body of social media content, the amount of online hate speech is also increasing. Due to the massive scale of the web, methods that automatically detect hate speech are required. Our survey describes key areas that have been explored to automatically recognize these types of utterances using natural language processing. We also discuss limits of those approaches.

2014

pdf bib
The DBOX Corpus Collection of Spoken Human-Human and Human-Machine Dialogues
Volha Petukhova | Martin Gropp | Dietrich Klakow | Gregor Eigner | Mario Topf | Stefan Srb | Petr Motlicek | Blaise Potard | John Dines | Olivier Deroo | Ronny Egeler | Uwe Meinz | Steffen Liersch | Anna Schmidt
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes the data collection and annotation carried out within the DBOX project ( Eureka project, number E! 7152). This project aims to develop interactive games based on spoken natural language human-computer dialogues, in 3 European languages: English, German and French. We collect the DBOX data continuously. We first start with human-human Wizard of Oz experiments to collect human-human data in order to model natural human dialogue behaviour, for better understanding of phenomena of human interactions and predicting interlocutors actions, and then replace the human Wizard by an increasingly advanced dialogue system, using evaluation data for system improvement. The designed dialogue system relies on a Question-Answering (QA) approach, but showing truly interactive gaming behaviour, e.g., by providing feedback, managing turns and contact, producing social signals and acts, e.g., encouraging vs. downplaying, polite vs. rude, positive vs. negative attitude towards players or their actions, etc. The DBOX dialogue corpus has required substantial investment. We expect it to have a great impact on the rest of the project. The DBOX project consortium will continue to maintain the corpus and to take an interest in its growth, e.g., expand to other languages. The resulting corpus will be publicly released.