Iris Hendrickx


2022

pdf bib
Negation Detection in Dutch Spoken Human-Computer Conversations
Tom Sweers | Iris Hendrickx | Helmer Strik
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Proper recognition and interpretation of negation signals in text or communication is crucial for any form of full natural language understanding. It is also essential for computational approaches to natural language processing. In this study we focus on negation detection in Dutch spoken human-computer conversations. Since there exists no Dutch (dialogue) corpus annotated for negation we have annotated a Dutch corpus sample to evaluate our method for automatic negation detection. We use transfer learning and trained NegBERT (an existing BERT implementation used for negation detection) on English data with multilingual BERT to detect negation in Dutch dialogues. Our results show that adding in-domain training material improves the results. We show that we can detect both negation cues and scope in Dutch dialogues with high precision and recall. We provide a detailed error analysis and discuss the effects of cross-lingual and cross-domain transfer learning on automatic negation detection.

pdf bib
Creating a Data Set of Abstractive Summaries of Turn-labeled Spoken Human-Computer Conversations
Iris Hendrickx
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Digital recorded written and spoken dialogues are becoming increasingly available as an effect of the technological advances such as online messenger services and the use of chatbots. Summaries are a natural way of presenting the important information gathered from dialogues. We present a unique data set that consists of Dutch spoken human-computer conversations, an annotation layer of turn labels, and conversational abstractive summaries of user answers. The data set is publicly available for research purposes.

pdf bib
Doing not Being: Concrete Language as a Bridge from Language Technology to Ethnically Inclusive Job Ads
Jetske Adams | Kyrill Poelmans | Iris Hendrickx | Martha Larson
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

This paper makes the case for studying concreteness in language as a bridge that will allow language technology to support the understanding and improvement of ethnic inclusivity in job advertisements. We propose an annotation scheme that guides the assignment of sentences in job ads to classes that reflect concrete actions, i.e., what the employer needs people to do, and abstract dispositions, i.e., who the employer expects people to be. Using an annotated dataset of Dutch-language job ads, we demonstrate that machine learning technology is effectively able to distinguish these classes.

2020

pdf bib
BLISS: An Agent for Collecting Spoken Dialogue Data about Health and Well-being
Jelte van Waterschoot | Iris Hendrickx | Arif Khan | Esther Klabbers | Marcel de Korte | Helmer Strik | Catia Cucchiarini | Mariët Theune
Proceedings of the Twelfth Language Resources and Evaluation Conference

An important objective in health-technology is the ability to gather information about people’s well-being. Structured interviews can be used to obtain this information, but are time-consuming and not scalable. Questionnaires provide an alternative way to extract such information, though typically lack depth. In this paper, we present our first prototype of the BLISS agent, an artificial intelligent agent which intends to automatically discover what makes people happy and healthy. The goal of Behaviour-based Language-Interactive Speaking Systems (BLISS) is to understand the motivations behind people’s happiness by conducting a personalized spoken dialogue based on a happiness model. We built our first prototype of the model to collect 55 spoken dialogues, in which the BLISS agent asked questions to users about their happiness and well-being. Apart from a description of the BLISS architecture, we also provide details about our dataset, which contains over 120 activities and 100 motivations and is made available for usage.

2018

pdf bib
A Multilingual Wikified Data Set of Educational Material
Iris Hendrickx | Eirini Takoulidou | Thanasis Naskos | Katia Lida Kermanidis | Vilelmini Sosoni | Hugo de Vos | Maria Stasimioti | Menno van Zaanen | Panayota Georgakopoulou | Valia Kordoni | Maja Popovic | Markus Egg | Antal van den Bosch
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
A Multi- versus a Single-classifier Approach for the Identification of Modality in the Portuguese Language
João Sequeira | Teresa Gonçalves | Paulo Quaresma | Amália Mendes | Iris Hendrickx
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Discovering the Language of Wine Reviews: A Text Mining Account
Els Lefever | Iris Hendrickx | Ilja Croijmans | Antal van den Bosch | Asifa Majid
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Annotating Speech, Attitude and Perception Reports
Corien Bary | Leopold Hess | Kees Thijs | Peter Berck | Iris Hendrickx
Proceedings of the 11th Linguistic Annotation Workshop

We present REPORTS, an annotation scheme for the annotation of speech, attitude and perception reports. Such a scheme makes it possible to annotate the various text elements involved in such reports (e.g. embedding entity, complement, complement head) and their relations in a uniform way, which in turn facilitates the automatic extraction of information on, for example, complementation and vocabulary distribution. We also present the Ancient Greek corpus RAG (Thucydides’ History of the Peloponnesian War), to which we have applied this scheme using the annotation tool BRAT. We discuss some of the issues, both theoretical and practical, that we encountered, show how the corpus helps in answering specific questions, and conclude that REPORTS fitted in well with our needs.

2016

pdf bib
Enhancing Access to Online Education: Quality Machine Translation of MOOC Content
Valia Kordoni | Antal van den Bosch | Katia Lida Kermanidis | Vilelmini Sosoni | Kostadin Cholakov | Iris Hendrickx | Matthias Huck | Andy Way
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The present work is an overview of the TraMOOC (Translation for Massive Open Online Courses) research and innovation project, a machine translation approach for online educational content. More specifically, videolectures, assignments, and MOOC forum text is automatically translated from English into eleven European and BRIC languages. Unlike previous approaches to machine translation, the output quality in TraMOOC relies on a multimodal evaluation schema that involves crowdsourcing, error type markup, an error taxonomy for translation model comparison, and implicit evaluation via text mining, i.e. entity recognition and its performance comparison between the source and the translated text, and sentiment analysis on the students’ forum posts. Finally, the evaluation output will result in more and better quality in-domain parallel data that will be fed back to the translation engine for higher quality output. The translation service will be incorporated into the Iversity MOOC platform and into the VideoLectures.net digital library portal.

pdf bib
TraMOOC (Translation for Massive Open Online Courses): providing reliable MT for MOOCs
Valia Kordoni | Lexi Birch | Ioana Buliga | Kostadin Cholakov | Markus Egg | Federico Gaspari | Yota Georgakopolou | Maria Gialama | Iris Hendrickx | Mitja Jermol | Katia Kermanidis | Joss Moorkens | Davor Orlic | Michael Papadopoulos | Maja Popović | Rico Sennrich | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Menno van Zaanen | Andy Way
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products

pdf bib
Very quaffable and great fun: Applying NLP to wine reviews
Iris Hendrickx | Els Lefever | Ilja Croijmans | Asifa Majid | Antal van den Bosch
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Modality annotation for Portuguese: from manual annotation to automatic labeling
Amália Mendes | Iris Hendrickx | Liciana Ávila | Paulo Quaresma | Teresa Gonҫalves | João Sequeira
Linguistic Issues in Language Technology, Volume 14, 2016 - Modality: Logic, Semantics, Annotation, and Machine Learning

We investigate modality in Portuguese and we combine a linguistic perspective with an application-oriented perspective on modality. We design an annotation scheme reflecting theoretical linguistic concepts and apply this schema to a small corpus sample to show how the scheme deals with real world language usage. We present two schemas for Portuguese, one for spoken Brazilian Portuguese and one for written European Portuguese. Furthermore, we use the annotated data not only to study the linguistic phenomena of modality, but also to train a practical text mining tool to detect modality in text automatically. The modality tagger uses a machine learning classifier trained on automatically extracted features from a syntactic parser. As we only have a small annotated sample available, the tagger was evaluated on 11 modal verbs that are frequent in our corpus and that denote more than one modal meaning. Finally, we discuss several valuable insights into the complexity of the semantic concept of modality that derive from the process of manual annotation of the corpus and from the analysis of the results of the automatic labeling: ambiguity and the semantic and syntactic properties typically associated to one modal meaning in context, and also the interaction of modality with negation and focus. The knowledge gained from the manual annotation task leads us to propose a new unified scheme for modality that applies to the two Portuguese varieties and covers both written and spoken data.

2015

pdf bib
Towards a Unified Approach to Modality Annotation in Portuguese
Luciana Beatriz Ávila | Amália Mendes | Iris Hendrickx
Proceedings of the Workshop on Models for Modality Annotation

pdf bib
TraMOOC: Translation for Massive Open Online Courses
Valia Kordoni | Kostadin Cholakov | Markus Egg | Andy Way | Lexi Birch | Katia Kermanidis | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Iris Hendrickx | Michael Papadopoulos | Panayota Georgakopoulou | Maria Gialama | Menno van Zaanen | Ioana Buliga | Mitja Jermol | Davor Orlic
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

pdf bib
TraMOOC: Translation for Massive Open Online Courses
Valia Kordoni | Kostadin Cholakov | Markus Egg | Andy Way | Lexi Birch | Katia Kermanidis | Vilelmini Sosoni | Dimitrios Tsoumakos | Antal van den Bosch | Iris Hendrickx | Michael Papadopoulos | Panayota Georgakopoulou | Maria Gialama | Menno van Zaanen | Ioana Buliga | Mitja Jermol | Davor Orlic
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

2014

pdf bib
SemEval 2014 Task 5 - L2 Writing Assistant
Maarten van Gompel | Iris Hendrickx | Antal van den Bosch | Els Lefever | Véronique Hoste
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
The Gulf of Guinea Creole Corpora
Tjerk Hagemeijer | Michel Généreux | Iris Hendrickx | Amália Mendes | Abigail Tiny | Armando Zamora
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present the process of building linguistic corpora of the Portuguese-related Gulf of Guinea creoles, a cluster of four historically related languages: Santome, Angolar, Principense and Fa d’Ambô. We faced the typical difficulties of languages lacking an official status, such as lack of standard spelling, language variation, lack of basic language instruments, and small data sets, which comprise data from the late 19th century to the present. In order to tackle these problems, the compiled written and transcribed spoken data collected during field work trips were adapted to a normalized spelling that was applied to the four languages. For the corpus compilation we followed corpus linguistics standards. We recorded meta data for each file and added morphosyntactic information based on a part-of-speech tag set that was designed to deal with the specificities of these languages. The corpora of three of the four creoles are already available and searchable via an online web interface.

pdf bib
Studying the Semantic Context of two Dutch Causal Connectives
Iris Hendrickx | Wilbert Spooren
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)

2013

pdf bib
Annotating the Interaction between Focus and Modality: the case of exclusive particles
Amália Mendes | Iris Hendrickx | Agostinho Salgueiro | Luciana Ávila
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
SemEval-2013 Task 4: Free Paraphrases of Noun Compounds
Iris Hendrickx | Zornitsa Kozareva | Preslav Nakov | Diarmuid Ó Séaghdha | Stan Szpakowicz | Tony Veale
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
Introducing the Reference Corpus of Contemporary Portuguese Online
Michel Généreux | Iris Hendrickx | Amália Mendes
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present our work in processing the Reference Corpus of Contemporary Portuguese and its publication online. After discussing how the corpus was built and our choice of meta-data, we turn to the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries. The Web platform is described, and we show examples of linguistic resources that can be extracted from the platform for use in linguistic studies or in NLP.

pdf bib
Modality in Text: a Proposal for Corpus Annotation
Iris Hendrickx | Amália Mendes | Silvia Mencarelli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We present a annotation scheme for modality in Portuguese. In our annotation scheme we have tried to combine a more theoretical linguistic viewpoint with a practical annotation scheme that will also be useful for NLP research but is not geared towards one specific application. Our notion of modality focuses on the attitude and opinion of the speaker, or of the subject of the sentence. We validated the annotation scheme on a corpus sample of approximately 2000 sentences that we fully annotated with modal information using the MMAX2 annotation tool to produce XML annotation. We discuss our main findings and give attention to the difficult cases that we encountered as they illustrate the complexity of modality and its interactions with other elements in the text.

2011

pdf bib
Cross-Domain Dutch Coreference Resolution
Orphée De Clercq | Véronique Hoste | Iris Hendrickx
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib
Complex Predicates Annotation in a Corpus of Portuguese
Iris Hendrickx | Amália Mendes | Sílvia Pereira | Anabela Gonçalves | Inês Duarte
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
Proposal for MWE Annotation in Running Text
Iris Hendrickx | Amália Mendes | Sandra Antunes
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
Segmentation Automatique de Lettres Historiques
Michel Généreux | Rita Marquilhas | Iris Hendrickx
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article présente une approche basée sur la comparaison fréquentielle de modèles lexicaux pour la segmentation automatique de textes historiques Portugais. Cette approche traite d’abord le problème de la segmentation comme un problème de classification, en attribuant à chaque élément lexical présent dans la phase d’apprentissage une valeur de saillance pour chaque type de segment. Ces modèles lexicaux permettent à la fois de produire une segmentation et de faire une analyse qualitative de textes historiques. Notre évaluation montre que l’approche adoptée permet de tirer de l’information sémantique que des approches se concentrant sur la détection des frontières séparant les segments ne peuvent acquérir.

pdf bib
SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations between Pairs of Nominals
Iris Hendrickx | Su Nam Kim | Zornitsa Kozareva | Preslav Nakov | Diarmuid Ó Séaghdha | Sebastian Padó | Marco Pennacchiotti | Lorenza Romano | Stan Szpakowicz
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf bib
Is Sentence Compression an NLG task?
Erwin Marsi | Emiel Krahmer | Iris Hendrickx | Walter Daelemans
Proceedings of the 12th European Workshop on Natural Language Generation (ENLG 2009)

pdf bib
SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals
Iris Hendrickx | Su Nam Kim | Zornitsa Kozareva | Preslav Nakov | Diarmuid Ó Séaghdha | Sebastian Padó | Marco Pennacchiotti | Lorenza Romano | Stan Szpakowicz
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

pdf bib
Reducing Redundancy in Multi-document Summarization Using Lexical Semantic Similarity
Iris Hendrickx | Walter Daelemans | Erwin Marsi | Emiel Krahmer
Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009)

2008

pdf bib
A Coreference Corpus and Resolution System for Dutch
Iris Hendrickx | Gosse Bouma | Frederik Coppens | Walter Daelemans | Veronique Hoste | Geert Kloosterman | Anne-Marie Mineur | Joeri Van Der Vloet | Jean-Luc Verschelde
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present the main outcomes of the COREA project: a corpus annotated with coreferential relations and a coreference resolution system for Dutch. In the project we developed annotation guidelines for coreference resolution for Dutch and annotated a corpus of 135K tokens. We discuss these guidelines, the annotation tool, and the inter-annotator agreement. We also show a visualization of the annotated relations. The standard approach to evaluate a coreference resolution system is to compare the predictions of the system to a hand-annotated gold standard test set (cross-validation). A more practically oriented evaluation is to test the usefulness of coreference relation information in an NLP application. We run experiments with an Information Extraction module for the medical domain, and measure the performance of this module with and without the coreference relation information. We present the results of both this application-oriented evaluation of our system and of a standard cross-validation evaluation. In a separate experiment we also evaluate the effect of coreference information produced by a simple rule-based coreference module in a Question Answering application.

pdf bib
CNTS: Memory-Based Learning of Generating Repeated References
Iris Hendrickx | Walter Daelemans | Kim Luyckx | Roser Morante | Vincent Van Asch
Proceedings of the Fifth International Natural Language Generation Conference

pdf bib
GRAPH: The Costs of Redundancy in Referring Expressions
Emiel Krahmer | Mariët Theune | Jette Viethen | Iris Hendrickx
Proceedings of the Fifth International Natural Language Generation Conference

2007

pdf bib
ILK: Machine learning of semantic relations with shallow features and almost no data
Iris Hendrickx | Roser Morante | Caroline Sporleder | Antal van den Bosch
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2004

pdf bib
Memory-based semantic role labeling: Optimizing features, algorithm, and output
Antal van den Bosch | Sander Canisius | Walter Daelemans | Iris Hendrickx | Erik Tjong Kim Sang
Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004

2003

pdf bib
Memory-based one-step named-entity recognition: Effects of seed list features, classifier stacking, and unannotated data
Iris Hendrickx | Antal van den Bosch
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

2002

pdf bib
Dutch Word Sense Disambiguation: Optimizing the Localness of Context
Antal van den Bosch | Iris Hendrickx | Veronique Hoste | Walter Daelemans
Proceedings of the ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions

pdf bib
Evaluating the results of a memory-based word-expert approach to unrestricted word sense disambiguation
Veronique Hoste | Walter Daelemans | Iris Hendrickx | Antal van den Bosch
Proceedings of the ACL-02 Workshop on Word Sense Disambiguation: Recent Successes and Future Directions

2001

pdf bib
Dutch Word Sense Disambiguation: Data and Preliminary Results
Iris Hendrickx | Antal van den Bosch
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

Search
Co-authors