Heike Adel

2023

pdf bib abs
Neighboring Words Affect Human Interpretation of Saliency Explanations
Alon Jacovi | Hendrik Schuff | Heike Adel | Ngoc Thang Vu | Yoav Goldberg
Findings of the Association for Computational Linguistics: ACL 2023

Word-level saliency explanations (“heat maps over words”) are often used to communicate feature-attribution in text-based models. Recent studies found that superficial factors such as word length can distort human interpretation of the communicated saliency scores. We conduct a user study to investigate how the marking of a word’s *neighboring words* affect the explainee’s perception of the word’s importance in the context of a saliency explanation. We find that neighboring words have significant effects on the word’s importance rating. Concretely, we identify that the influence changes based on neighboring direction (left vs. right) and a-priori linguistic and computational measures of phrases and collocations (vs. unrelated neighboring words).Our results question whether text-based saliency explanations should be continued to be communicated at word level, and inform future research on alternative saliency explanation methods.

pdf bib abs
Is the Answer in the Text? Challenging ChatGPT with Evidence Retrieval from Instructive Text
Sophie Henning | Talita Anthonio | Wei Zhou | Heike Adel | Mohsen Mesgar | Annemarie Friedrich
Findings of the Association for Computational Linguistics: EMNLP 2023

Generative language models have recently shown remarkable success in generating answers to questions in a given textual context. However, these answers may suffer from hallucination, wrongly cite evidence, and spread misleading information. In this work, we address this problem by employing ChatGPT, a state-of-the-art generative model, as a machine-reading system. We ask it to retrieve answers to lexically varied and open-ended questions from trustworthy instructive texts. We introduce WHERE (WikiHow Evidence REtrieval), a new high-quality evaluation benchmark of a set of WikiHow articles exhaustively annotated with evidence sentences to questions that comes with a special challenge: All questions are about the article’s topic, but not all can be answered using the provided context. We interestingly find that when using a regular question-answering prompt, ChatGPT neglects to detect the unanswerable cases. When provided with a few examples, it learns to better judge whether a text provides answer evidence or not. Alongside this important finding, our dataset defines a new benchmark for evidence retrieval in question answering, which we argue is one of the necessary next steps for making large language models more trustworthy.

pdf bib abs
NLNDE at SemEval-2023 Task 12: Adaptive Pretraining and Source Language Selection for Low-Resource Multilingual Sentiment Analysis
Mingyang Wang | Heike Adel | Lukas Lange | Jannik Strötgen | Hinrich Schütze
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes our system developed for the SemEval-2023 Task 12 “Sentiment Analysis for Low-resource African Languages using Twitter Dataset”. Sentiment analysis is one of the most widely studied applications in natural language processing. However, most prior work still focuses on a small number of high-resource languages. Building reliable sentiment analysis systems for low-resource languages remains challenging, due to the limited training data in this task. In this work, we propose to leverage language-adaptive and task-adaptive pretraining on African texts and study transfer learning with source language selection on top of an African language-centric pretrained language model. Our key findings are: (1) Adapting the pretrained model to the target language and task using a small yet relevant corpus improves performance remarkably by more than 10 F1 score points. (2) Selecting source languages with positive transfer gains during training can avoid harmful interference from dissimilar languages, leading to better results in multilingual and cross-lingual settings. In the shared task, our system wins 8 out of 15 tracks and, in particular, performs best in the multilingual evaluation.

pdf bib abs
GradSim: Gradient-Based Language Grouping for Effective Multilingual Training
Mingyang Wang | Heike Adel | Lukas Lange | Jannik Strötgen | Hinrich Schuetze
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Most languages of the world pose low-resource challenges to natural language processing models. With multilingual training, knowledge can be shared among languages. However, not all languages positively influence each other and it is an open research question how to select the most suitable set of languages for multilingual training and avoid negative interference among languages whose characteristics or data distributions are not compatible. In this paper, we propose GradSim, a language grouping method based on gradient similarity. Our experiments on three diverse multilingual benchmark datasets show that it leads to the largest performance gains compared to other similarity measures and it is better correlated with cross-lingual model performance. As a result, we set the new state of the art on AfriSenti, a benchmark dataset for sentiment analysis on low-resource African languages. In our extensive analysis, we further reveal that besides linguistic features, the topics of the datasets play an important role for language grouping and that lower layers of transformer models encode language-specific features while higher layers capture task-specific information.

pdf bib abs
Multilingual Normalization of Temporal Expressions with Masked Language Models
Lukas Lange | Jannik Strötgen | Heike Adel | Dietrich Klakow
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

The detection and normalization of temporal expressions is an important task and preprocessing step for many applications. However, prior work on normalization is rule-based, which severely limits the applicability in real-world multilingual settings, due to the costly creation of new rules. We propose a novel neural method for normalizing temporal expressions based on masked language modeling. Our multilingual method outperforms prior rule-based systems in many languages, and in particular, for low-resource languages with performance improvements of up to 33 F1 on average compared to the state of the art.

pdf bib abs
SwitchPrompt: Learning Domain-Specific Gated Soft Prompts for Classification in Low-Resource Domains
Koustava Goswami | Lukas Lange | Jun Araki | Heike Adel
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Prompting pre-trained language models leads to promising results across natural language processing tasks but is less effective when applied in low-resource domains, due to the domain gap between the pre-training data and the downstream task. In this work, we bridge this gap with a novel and lightweight prompting methodology called SwitchPrompt for the adaptation of language models trained on datasets from the general domain to diverse low-resource domains. Using domain-specific keywords with a trainable gated prompt, SwitchPrompt offers domain-oriented prompting, that is, effective guidance on the target domains for general-domain language models. Our few-shot experiments on three text classification benchmarks demonstrate the efficacy of the general-domain pre-trained language models when used with SwitchPrompt. They often even outperform their domain-specific counterparts trained with baseline state-of-the-art prompting methods by up to 10.7% performance increase in accuracy. This result indicates that SwitchPrompt effectively reduces the need for domain-specific language model pre-training.

2022

pdf bib abs
A Study on Entity Linking Across Domains: Which Data is Best for Fine-Tuning?
Hassan Soliman | Heike Adel | Mohamed H. Gad-Elrab | Dragan Milchevski | Jannik Strötgen
Proceedings of the 7th Workshop on Representation Learning for NLP

Entity linking disambiguates mentions by mapping them to entities in a knowledge graph (KG). One important question in today’s research is how to extend neural entity linking systems to new domains. In this paper, we aim at a system that enables linking mentions to entities from a general-domain KG and a domain-specific KG at the same time. In particular, we represent the entities of different KGs in a joint vector space and address the questions of which data is best suited for creating and fine-tuning that space, and whether fine-tuning harms performance on the general domain. We find that a combination of data from both the general and the special domain is most helpful. The first is especially necessary for avoiding performance loss on the general domain. While additional supervision on entities that appear in both KGs performs best in an intrinsic evaluation of the vector space, it has less impact on the downstream task of entity linking.

2021

pdf bib abs
A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios
Michael A. Hedderich | Lukas Lange | Heike Adel | Jannik Strötgen | Dietrich Klakow
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings. Motivated by the recent fundamental changes towards neural models and the popular pre-train and fine-tune paradigm, we survey promising approaches for low-resource natural language processing. After a discussion about the different dimensions of data availability, we give a structured overview of methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific low-resource setting. Further key aspects of this work are to highlight open issues and to outline promising directions for future research.

pdf bib abs
Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings
Hendrik Schuff | Hsiu-Yu Yang | Heike Adel | Ngoc Thang Vu
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Natural language inference (NLI) requires models to learn and apply commonsense knowledge. These reasoning abilities are particularly important for explainable NLI systems that generate a natural language explanation in addition to their label prediction. The integration of external knowledge has been shown to improve NLI systems, here we investigate whether it can also improve their explanation capabilities. For this, we investigate different sources of external knowledge and evaluate the performance of our models on in-domain data as well as on special transfer datasets that are designed to assess fine-grained reasoning capabilities. We find that different sources of knowledge have a different effect on reasoning abilities, for example, implicit knowledge stored in language models can hinder reasoning on numbers and negations. Finally, we conduct the largest and most fine-grained explainable NLI crowdsourcing study to date. It reveals that even large differences in automatic performance scores do neither reflect in human ratings of label, explanation, commonsense nor grammar correctness.

pdf bib abs
FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations
Lukas Lange | Heike Adel | Jannik Strötgen | Dietrich Klakow
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Combining several embeddings typically improves performance in downstream tasks as different embeddings encode different information. It has been shown that even models using embeddings from transformers still benefit from the inclusion of standard word embeddings. However, the combination of embeddings of different types and dimensions is challenging. As an alternative to attention-based meta-embeddings, we propose feature-based adversarial meta-embeddings (FAME) with an attention function that is guided by features reflecting word-specific properties, such as shape and frequency, and show that this is beneficial to handle subword-based embeddings. In addition, FAME uses adversarial training to optimize the mappings of differently-sized embeddings to the same space. We demonstrate that FAME works effectively across languages and domains for sequence labeling and sentence classification, in particular in low-resource settings. FAME sets the new state of the art for POS tagging in 27 languages, various NER settings and question classification in different domains.

pdf bib abs
To Share or not to Share: Predicting Sets of Sources for Model Transfer Learning
Lukas Lange | Jannik Strötgen | Heike Adel | Dietrich Klakow
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In low-resource settings, model transfer can help to overcome a lack of labeled data for many tasks and domains. However, predicting useful transfer sources is a challenging problem, as even the most similar sources might lead to unexpected negative transfer results. Thus, ranking methods based on task and text similarity — as suggested in prior work — may not be sufficient to identify promising sources. To tackle this problem, we propose a new approach to automatically determine which and how many sources should be exploited. For this, we study the effects of model transfer on sequence labeling across various domains and tasks and show that our methods based on model similarity and support vector machines are able to predict promising sources, resulting in performance increases of up to 24 F1 points.

pdf bib
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Heike Adel | Shuming Shi
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

2020

pdf bib abs
ClusterDataSplit: Exploring Challenging Clustering-Based Data Splits for Model Performance Evaluation
Hanna Wecker | Annemarie Friedrich | Heike Adel
Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems

This paper adds to the ongoing discussion in the natural language processing community on how to choose a good development set. Motivated by the real-life necessity of applying machine learning models to different data distributions, we propose a clustering-based data splitting algorithm. It creates development (or test) sets which are lexically different from the training data while ensuring similar label distributions. Hence, we are able to create challenging cross-validation evaluation setups while abstracting away from performance differences resulting from label distribution shifts between training and test data. In addition, we present a Python-based tool for analyzing and visualizing data split characteristics and model performance. We illustrate the workings and results of our approach using a sentiment analysis and a patent classification task.

pdf bib abs
On the Choice of Auxiliary Languages for Improved Sequence Tagging
Lukas Lange | Heike Adel | Jannik Strötgen
Proceedings of the 5th Workshop on Representation Learning for NLP

Recent work showed that embeddings from related languages can improve the performance of sequence tagging, even for monolingual models. In this analysis paper, we investigate whether the best auxiliary language can be predicted based on language distances and show that the most related language is not always the best auxiliary language. Further, we show that attention-based meta-embeddings can effectively combine pre-trained embeddings from different languages for sequence tagging and set new state-of-the-art results for part-of-speech tagging in five languages.

pdf bib abs
Adversarial Alignment of Multilingual Models for Extracting Temporal Expressions from Text
Lukas Lange | Anastasiia Iurshina | Heike Adel | Jannik Strötgen
Proceedings of the 5th Workshop on Representation Learning for NLP

Although temporal tagging is still dominated by rule-based systems, there have been recent attempts at neural temporal taggers. However, all of them focus on monolingual settings. In this paper, we explore multilingual methods for the extraction of temporal expressions from text and investigate adversarial training for aligning embedding spaces to one common space. With this, we create a single multilingual model that can also be transferred to unseen languages and set the new state of the art in those cross-lingual transfer experiments.

pdf bib abs
The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain
Annemarie Friedrich | Heike Adel | Federico Tomazic | Johannes Hingerl | Renou Benteau | Anika Marusczyk | Lukas Lange
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper presents a new challenging information extraction task in the domain of materials science. We develop an annotation scheme for marking information on experiments related to solid oxide fuel cells in scientific publications, such as involved materials and measurement conditions. With this paper, we publish our annotation guidelines, as well as our SOFC-Exp corpus consisting of 45 open-access scholarly articles annotated by domain experts. A corpus and an inter-annotator agreement study demonstrate the complexity of the suggested named entity recognition and slot filling tasks as well as high annotation quality. We also present strong neural-network based models for a variety of tasks that can be addressed on the basis of our new data set. On all tasks, using BERT embeddings leads to large performance gains, but with increasing task complexity, adding a recurrent neural network on top seems beneficial. Our models will serve as competitive baselines in future work, and analysis of their performance highlights difficult cases when modeling the data and suggests promising research directions.

pdf bib abs
Closing the Gap: Joint De-Identification and Concept Extraction in the Clinical Domain
Lukas Lange | Heike Adel | Jannik Strötgen
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Exploiting natural language processing in the clinical domain requires de-identification, i.e., anonymization of personal information in texts. However, current research considers de-identification and downstream tasks, such as concept extraction, only in isolation and does not study the effects of de-identification on other tasks. In this paper, we close this gap by reporting concept extraction performance on automatically anonymized data and investigating joint models for de-identification and concept extraction. In particular, we propose a stacked model with restricted access to privacy sensitive information and a multitask model. We set the new state of the art on benchmark datasets in English (96.1% F1 for de-identification and 88.9% F1 for concept extraction) and Spanish (91.4% F1 for concept extraction).

pdf bib abs
An Analysis of Simple Data Augmentation for Named Entity Recognition
Xiang Dai | Heike Adel
Proceedings of the 28th International Conference on Computational Linguistics

Simple yet effective data augmentation techniques have been proposed for sentence-level and sentence-pair natural language processing tasks. Inspired by these efforts, we design and compare data augmentation for named entity recognition, which is usually modeled as a token-level sequence labeling problem. Through experiments on two data sets from the biomedical and materials science domains (i2b2-2010 and MaSciP), we show that simple augmentation can boost performance for both recurrent and transformer-based models, especially for small training sets.

pdf bib abs
F1 is Not Enough! Models and Evaluation Towards User-Centered Explainable Question Answering
Hendrik Schuff | Heike Adel | Ngoc Thang Vu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Explainable question answering systems predict an answer together with an explanation showing why the answer has been selected. The goal is to enable users to assess the correctness of the system and understand its reasoning process. However, we show that current models and evaluation settings have shortcomings regarding the coupling of answer and explanation which might cause serious issues in user experience. As a remedy, we propose a hierarchical model and a new regularization term to strengthen the answer-explanation coupling as well as two evaluation scores to quantify the coupling. We conduct experiments on the HOTPOTQA benchmark data set and perform a user study. The user study shows that our models increase the ability of the users to judge the correctness of the system and that scores like F1 are not enough to estimate the usefulness of a model in a practical setting with human users. Our scores are better aligned with user experience, making them promising candidates for model selection.

2019

pdf bib abs
Adversarial Training for Satire Detection: Controlling for Confounding Variables
Robert McHardy | Heike Adel | Roman Klinger
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

The automatic detection of satire vs. regular news is relevant for downstream applications (for instance, knowledge base population) and to improve the understanding of linguistic characteristics of satire. Recent approaches build upon corpora which have been labeled automatically based on article sources. We hypothesize that this encourages the models to learn characteristics for different publication sources (e.g., “The Onion” vs. “The Guardian”) rather than characteristics of satire, leading to poor generalization performance to unseen publication sources. We therefore propose a novel model for satire detection with an adversarial component to control for the confounding variable of publication source. On a large novel data set collected from German news (which we make available to the research community), we observe comparable satire classification performance and, as desired, a considerable drop in publication classification performance with adversarial training. Our analysis shows that the adversarial component is crucial for the model to learn to pay attention to linguistic properties of satire.

pdf bib abs
Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging
Apostolos Kemos | Heike Adel | Hinrich Schütze
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Character-level models of tokens have been shown to be effective at dealing with within-token noise and out-of-vocabulary words. However, they often still rely on correct token boundaries. In this paper, we propose to eliminate the need for tokenizers with an end-to-end character-level semi-Markov conditional random field. It uses neural networks for its character and segment representations. We demonstrate its effectiveness in multilingual settings and when token boundaries are noisy: It matches state-of-the-art part-of-speech taggers for various languages and significantly outperforms them on a noisy English version of a benchmark dataset. Our code and the noisy dataset are publicly available at http://cistern.cis.lmu.de/semiCRF.

pdf bib abs
NLNDE: Enhancing Neural Sequence Taggers with Attention and Noisy Channel for Robust Pharmacological Entity Detection
Lukas Lange | Heike Adel | Jannik Strötgen
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

Named entity recognition has been extensively studied on English news texts. However, the transfer to other domains and languages is still a challenging problem. In this paper, we describe the system with which we participated in the first subtrack of the PharmaCoNER competition of the BioNLP Open Shared Tasks 2019. Aiming at pharmacological entity detection in Spanish texts, the task provides a non-standard domain and language setting. However, we propose an architecture that requires neither language nor domain expertise. We treat the task as a sequence labeling task and experiment with attention-based embedding selection and the training on automatically annotated data to further improve our system’s performance. Our system achieves promising results, especially by combining the different techniques, and reaches up to 88.6% F1 in the competition.

2018

pdf bib abs
DERE: A Task and Domain-Independent Slot Filling Framework for Declarative Relation Extraction
Heike Adel | Laura Ana Maria Bostan | Sean Papay | Sebastian Padó | Roman Klinger
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Most machine learning systems for natural language processing are tailored to specific tasks. As a result, comparability of models across tasks is missing and their applicability to new tasks is limited. This affects end users without machine learning experience as well as model developers. To address these limitations, we present DERE, a novel framework for declarative specification and compilation of template-based information extraction. It uses a generic specification language for the task and for data annotations in terms of spans and frames. This formalism enables the representation of a large variety of natural language processing challenges. The backend can be instantiated by different models, following different paradigms. The clear separation of frame specification and model backend will ease the implementation of new models and the evaluation of different models across different tasks. Furthermore, it simplifies transfer learning, joint learning across tasks and/or domains as well as the assessment of model generalizability. DERE is available as open-source software.

2017

pdf bib abs
Global Normalization of Convolutional Neural Networks for Joint Entity and Relation Classification
Heike Adel | Hinrich Schütze
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We introduce globally normalized convolutional neural networks for joint entity classification and relation extraction. In particular, we propose a way to utilize a linear-chain conditional random field output layer for predicting entity types and relations between entities at the same time. Our experiments show that global normalization outperforms a locally normalized softmax layer on a benchmark dataset.

pdf bib abs
Exploring Different Dimensions of Attention for Uncertainty Detection
Heike Adel | Hinrich Schütze
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Neural networks with attention have proven effective for many natural language processing tasks. In this paper, we develop attention mechanisms for uncertainty detection. In particular, we generalize standardly used attention mechanisms by introducing external attention and sequence-preserving attention. These novel architectures differ from standard approaches in that they use external resources to compute attention weights and preserve sequence information. We compare them to other configurations along different dimensions of attention. Our novel architectures set the new state of the art on a Wikipedia benchmark dataset and perform similar to the state-of-the-art model on a biomedical benchmark which uses a large set of linguistic features.

pdf bib abs
Noise Mitigation for Neural Entity Typing and Relation Extraction
Yadollah Yaghoobzadeh | Heike Adel | Hinrich Schütze
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

In this paper, we address two different types of noise in information extraction models: noise from distant supervision and noise from pipeline input features. Our target tasks are entity typing and relation extraction. For the first noise type, we introduce multi-instance multi-label learning algorithms using neural network models, and apply them to fine-grained entity typing for the first time. Our model outperforms the state-of-the-art supervised approach which uses global embeddings of entities. For the second noise type, we propose ways to improve the integration of noisy entity type predictions into relation extraction. Our experiments show that probabilistic predictions are more robust than discrete predictions and that joint training of the two tasks performs best.

pdf bib abs
Ranking Convolutional Recurrent Neural Networks for Purchase Stage Identification on Imbalanced Twitter Data
Heike Adel | Francine Chen | Yan-Ying Chen
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Users often use social media to share their interest in products. We propose to identify purchase stages from Twitter data following the AIDA model (Awareness, Interest, Desire, Action). In particular, we define the task of classifying the purchase stage of each tweet in a user’s tweet sequence. We introduce RCRNN, a Ranking Convolutional Recurrent Neural Network which computes tweet representations using convolution over word embeddings and models a tweet sequence with gated recurrent units. Also, we consider various methods to cope with the imbalanced label distribution in our data and show that a ranking layer outperforms class weights.

Heike Adel

2023

2022

2021

2020

2019

2018

2017

2016

2014

2013

Co-authors

Venues