Frank Keller


2023

pdf bib
Learning the Effects of Physical Actions in a Multi-modal Environment
Gautier Dagan | Frank Keller | Alex Lascarides
Findings of the Association for Computational Linguistics: EACL 2023

Large Language Models (LLMs) handle physical commonsense information inadequately. As a result of being trained in a disembodied setting, LLMs often fail to predict an action’s outcome in a given environment. However, predicting the effects of an action before it is executed is crucial in planning, where coherent sequences of actions are often needed to achieve a goal. Therefore, we introduce the multi-modal task of predicting the outcomes of actions solely from realistic sensory inputs (images and text). Next, we extend an LLM to model latent representations of objects to better predict action outcomes in an environment. We show that multi-modal models can capture physical commonsense when augmented with visual information. Finally, we evaluate our model’s performance on novel actions and objects and find that combining modalities help models to generalize and learn physical commonsense reasoning better.

pdf bib
Visual Storytelling with Question-Answer Plans
Danyang Liu | Mirella Lapata | Frank Keller
Findings of the Association for Computational Linguistics: EMNLP 2023

Visual storytelling aims to generate compelling narratives from image sequences. Existing models often focus on enhancing the representation of the image sequence, e.g., with external knowledge sources or advanced graph structures. Despite recent progress, the stories are often repetitive, illogical, and lacking in detail. To mitigate these issues, we present a novel framework which integrates visual representations with pretrained language models and planning. Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret. It also leverages a sequence of question-answer pairs as a blueprint plan for selecting salient visual concepts and determining how they should be assembled into a narrative. Automatic and human evaluation on the VIST benchmark demonstrates that blueprint-based models generate stories that are more coherent, interesting, and natural compared to competitive baselines and state-of-the-art systems.

pdf bib
Semi-supervised multimodal coreference resolution in image narrations
Arushi Goel | Basura Fernando | Frank Keller | Hakan Bilen
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

In this paper, we study multimodal coreference resolution, specifically where a longer descriptive text, i.e., a narration is paired with an image. This poses significant challenges due to fine-grained image-text alignment, inherent ambiguity present in narrative language, and unavailability of large annotated training sets. To tackle these challenges, we present a data efficient semi-supervised approach that utilizes image-narration pairs to resolve coreferences and narrative grounding in a multimodal context. Our approach incorporates losses for both labeled and unlabeled data within a cross-modal framework. Our evaluation shows that the proposed approach outperforms strong baselines both quantitatively and qualitatively, for the tasks of coreference resolution and narrative grounding.

pdf bib
Meta-learning For Vision-and-language Cross-lingual Transfer
Hanxu Hu | Frank Keller
Proceedings of the 3rd Workshop on Multi-lingual Representation Learning (MRL)

2021

pdf bib
Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories
David Wilmot | Frank Keller
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Measuring event salience is essential in the understanding of stories. This paper takes a recent unsupervised method for salience detection derived from Barthes Cardinal Functions and theories of surprise and applies it to longer narrative forms. We improve the standard transformer language model by incorporating an external knowledgebase (derived from Retrieval Augmented Generation) and adding a memory mechanism to enhance performance on longer works. We use a novel approach to derive salience annotation using chapter-aligned summaries from the Shmoop corpus for classic literary works. Our evaluation against this data demonstrates that our salience detection model improves performance over and above a non-knowledgebase and memory augmented language model, both of which are crucial to this improvement.

pdf bib
Investigating Negation in Pre-trained Vision-and-language Models
Radina Dobreva | Frank Keller
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Pre-trained vision-and-language models have achieved impressive results on a variety of tasks, including ones that require complex reasoning beyond object recognition. However, little is known about how they achieve these results or what their limitations are. In this paper, we focus on a particular linguistic capability, namely the understanding of negation. We borrow techniques from the analysis of language models to investigate the ability of pre-trained vision-and-language models to handle negation. We find that these models severely underperform in the presence of negation.

2020

pdf bib
Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads
Bowen Li | Taeuk Kim | Reinald Kim Amplayo | Frank Keller
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Transformer-based pre-trained language models (PLMs) have dramatically improved the state of the art in NLP across many tasks. This has led to substantial interest in analyzing the syntactic knowledge PLMs learn. Previous approaches to this question have been limited, mostly using test suites or probes. Here, we propose a novel fully unsupervised parsing approach that extracts constituency trees from PLM attention heads. We rank transformer attention heads based on their inherent properties, and create an ensemble of high-ranking heads to produce the final tree. Our method is adaptable to low-resource languages, as it does not rely on development sets, which can be expensive to annotate. Our experiments show that the proposed method often outperform existing approaches if there is no development set present. Our unsupervised parser can also be used as a tool to analyze the grammars PLMs learn implicitly. For this, we use the parse trees induced by our method to train a neural PCFG and compare it to a grammar derived from a human-annotated treebank.

pdf bib
Modelling Suspense in Short Stories as Uncertainty Reduction over Neural Representation
David Wilmot | Frank Keller
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Suspense is a crucial ingredient of narrative fiction, engaging readers and making stories compelling. While there is a vast theoretical literature on suspense, it is computationally not well understood. We compare two ways for modelling suspense: surprise, a backward-looking measure of how unexpected the current state is given the story so far; and uncertainty reduction, a forward-looking measure of how unexpected the continuation of the story is. Both can be computed either directly over story representations or over their probability distributions. We propose a hierarchical language model that encodes stories and computes surprise and uncertainty reduction. Evaluating against short stories annotated with human suspense judgements, we find that uncertainty reduction over representations is the best predictor, resulting in near human accuracy. We also show that uncertainty reduction can be used to predict suspenseful events in movie synopses.

pdf bib
Screenplay Summarization Using Latent Narrative Structure
Pinelopi Papalampidi | Frank Keller | Lea Frermann | Mirella Lapata
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Most general-purpose extractive summarization models are trained on news articles, which are short and present all important information upfront. As a result, such models are biased on position and often perform a smart selection of sentences from the beginning of the document. When summarizing long narratives, which have complex structure and present information piecemeal, simple position heuristics are not sufficient. In this paper, we propose to explicitly incorporate the underlying structure of narratives into general unsupervised and supervised extractive summarization models. We formalize narrative structure in terms of key narrative events (turning points) and treat it as latent in order to summarize screenplays (i.e., extract an optimal sequence of scenes). Experimental results on the CSI corpus of TV screenplays, which we augment with scene-level summarization labels, show that latent turning points correlate with important aspects of a CSI episode and improve summarization performance over general extractive algorithms leading to more complete and diverse summaries.

2019

pdf bib
An Imitation Learning Approach to Unsupervised Parsing
Bowen Li | Lili Mou | Frank Keller
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Recently, there has been an increasing interest in unsupervised parsers that optimize semantically oriented objectives, typically using reinforcement learning. Unfortunately, the learned trees often do not match actual syntax trees well. Shen et al. (2018) propose a structured attention mechanism for language modeling (PRPN), which induces better syntactic structures but relies on ad hoc heuristics. Also, their model lacks interpretability as it is not grounded in parsing actions. In our work, we propose an imitation learning approach to unsupervised parsing, where we transfer the syntactic knowledge induced by PRPN to a Tree-LSTM model with discrete parsing actions. Its policy is then refined by Gumbel-Softmax training towards a semantically oriented objective. We evaluate our approach on the All Natural Language Inference dataset and show that it achieves a new state of the art in terms of parsing F-score, outperforming our base models, including PRPN.

pdf bib
Cross-lingual Visual Verb Sense Disambiguation
Spandana Gella | Desmond Elliott | Frank Keller
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recent work has shown that visual context improves cross-lingual sense disambiguation for nouns. We extend this line of work to the more challenging task of cross-lingual verb sense disambiguation, introducing the MultiSense dataset of 9,504 images annotated with English, German, and Spanish verbs. Each image in MultiSense is annotated with an English verb and its translation in German or Spanish. We show that cross-lingual verb sense disambiguation models benefit from visual context, compared to unimodal baselines. We also show that the verb sense predicted by our best disambiguation model can improve the results of a text-only machine translation system when used for a multimodal translation task.

pdf bib
Movie Plot Analysis via Turning Point Identification
Pinelopi Papalampidi | Frank Keller | Mirella Lapata
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

According to screenwriting theory, turning points (e.g., change of plans, major setback, climax) are crucial narrative moments within a screenplay: they define the plot structure, determine its progression and segment the screenplay into thematic units (e.g., setup, complications, aftermath). We propose the task of turning point identification in movies as a means of analyzing their narrative structure. We argue that turning points and the segmentation they provide can facilitate processing long, complex narratives, such as screenplays, for summarization and question answering. We introduce a dataset consisting of screenplays and plot synopses annotated with turning points and present an end-to-end neural network model that identifies turning points in plot synopses and projects them onto scenes in screenplays. Our model outperforms strong baselines based on state-of-the-art sentence representations and the expected position of turning points.

2018

pdf bib
An Evaluation of Image-Based Verb Prediction Models against Human Eye-Tracking Data
Spandana Gella | Frank Keller
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Recent research in language and vision has developed models for predicting and disambiguating verbs from images. Here, we ask whether the predictions made by such models correspond to human intuitions about visual verbs. We show that the image regions a verb prediction model identifies as salient for a given verb correlate with the regions fixated by human observers performing a verb classification task.

2017

pdf bib
An Analysis of Action Recognition Datasets for Language and Vision Tasks
Spandana Gella | Frank Keller
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

A large amount of recent research has focused on tasks that combine language and vision, resulting in a proliferation of datasets and methods. One such task is action recognition, whose applications include image annotation, scene understanding and image retrieval. In this survey, we categorize the existing approaches based on how they conceptualize this problem and provide a detailed review of existing datasets, highlighting their diversity as well as advantages and disadvantages. We focus on recently developed datasets which link visual information with linguistic resources and provide a fine-grained syntactic and semantic analysis of actions in images.

pdf bib
Image Pivoting for Learning Multilingual Multimodal Representations
Spandana Gella | Rico Sennrich | Frank Keller | Mirella Lapata
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper we propose a model to learn multimodal multilingual representations for matching images and sentences in different languages, with the aim of advancing multilingual versions of image search and image understanding. Our model learns a common representation for images and their descriptions in two different languages (which need not be parallel) by considering the image as a pivot between two languages. We introduce a new pairwise ranking loss function which can handle both symmetric and asymmetric similarity between the two modalities. We evaluate our models on image-description ranking for German and English, and on semantic textual similarity of image descriptions in English. In both cases we achieve state-of-the-art performance.

2016

pdf bib
Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings
Spandana Gella | Mirella Lapata | Frank Keller
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Weakly Supervised Part-of-speech Tagging Using Eye-tracking Data
Maria Barrett | Joachim Bingel | Frank Keller | Anders Søgaard
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Modeling Human Reading with Neural Attention
Michael Hahn | Frank Keller
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Cross-lingual Transfer of Correlations between Parts of Speech and Gaze Features
Maria Barrett | Frank Keller | Anders Søgaard
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Several recent studies have shown that eye movements during reading provide information about grammatical and syntactic processing, which can assist the induction of NLP models. All these studies have been limited to English, however. This study shows that gaze and part of speech (PoS) correlations largely transfer across English and French. This means that we can replicate previous studies on gaze-based PoS tagging for French, but also that we can use English gaze data to assist the induction of French NLP models.

2015

pdf bib
Semantic Role Labeling Improves Incremental Parsing
Ioannis Konstas | Frank Keller
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Proceedings of the Third Workshop on Vision and Language
Anja Belz | Darren Cosker | Frank Keller | William Smith | Kalina Bontcheva | Sien Moens | Alan Smeaton
Proceedings of the Third Workshop on Vision and Language

pdf bib
Comparing Automatic Evaluation Measures for Image Description
Desmond Elliott | Frank Keller
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Query-by-Example Image Retrieval using Visual Dependency Representations
Desmond Elliott | Victor Lavrenko | Frank Keller
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Incremental Semantic Role Labeling with Tree Adjoining Grammar
Ioannis Konstas | Frank Keller | Vera Demberg | Mirella Lapata
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

2013

pdf bib
Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech
Stella Frank | Frank Keller | Sharon Goldwater
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Image Description using Visual Dependency Representations
Desmond Elliott | Frank Keller
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Incremental, Predictive Parsing with Psycholinguistically Motivated Tree-Adjoining Grammar
Vera Demberg | Frank Keller | Alexander Koller
Computational Linguistics, Volume 39, Issue 4 - December 2013

pdf bib
Incremental Tree Substitution Grammar for Parsing and Sentence Prediction
Federico Sangati | Frank Keller
Transactions of the Association for Computational Linguistics, Volume 1

In this paper, we present the first incremental parser for Tree Substitution Grammar (TSG). A TSG allows arbitrarily large syntactic fragments to be combined into complete trees; we show how constraints (including lexicalization) can be imposed on the shape of the TSG fragments to enable incremental processing. We propose an efficient Earley-based algorithm for incremental TSG parsing and report an F-score competitive with other incremental parsers. In addition to whole-sentence F-score, we also evaluate the partial trees that the parser constructs for sentence prefixes; partial trees play an important role in incremental interpretation, language modeling, and psycholinguistics. Unlike existing parsers, our incremental TSG parser can generate partial trees that include predictions about the upcoming words in a sentence. We show that it outperforms an n-gram model in predicting more than one upcoming word.

2011

pdf bib
A Model of Discourse Predictions in Human Sentence Processing
Amit Dubey | Frank Keller | Patrick Sturt
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics
Frank Keller | David Reitter
Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics

2010

pdf bib
Using Sentence Type Information for Syntactic Category Acquisition
Stella Frank | Sharon Goldwater | Frank Keller
Proceedings of the 2010 Workshop on Cognitive Modeling and Computational Linguistics

pdf bib
Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure
Jeff Mitchell | Mirella Lapata | Vera Demberg | Frank Keller
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Cognitively Plausible Models of Human Language Processing
Frank Keller
Proceedings of the ACL 2010 Conference Short Papers

2009

pdf bib
The Interaction of Syntactic Theory and Computational Psycholinguistics
Frank Keller
Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics: Virtuous, Vicious or Vacuous?

2008

pdf bib
Coling 2008: Proceedings of the workshop on Human Judgements in Computational Linguistics
Ron Artstein | Gemma Boleda | Frank Keller | Sabine Schulte im Walde
Coling 2008: Proceedings of the workshop on Human Judgements in Computational Linguistics

pdf bib
A Psycholinguistically Motivated Version of TAG
Vera Demberg | Frank Keller
Proceedings of the Ninth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+9)

2007

pdf bib
Using Foreign Inclusion Detection to Improve Parsing Performance
Beatrice Alex | Amit Dubey | Frank Keller
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
An Information Retrieval Approach to Sense Ranking
Mirella Lapata | Frank Keller
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

2006

pdf bib
Computational Modelling of Structural Priming in Dialogue
David Reitter | Frank Keller | Johanna D. Moore
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

pdf bib
Modelling Semantic Role Pausibility in Human Sentence Processing
Ulrike Padó | Matthew Crocker | Frank Keller
11th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Integrating Syntactic Priming into an Incremental Probabilistic Parser, with an Application to Psycholinguistic Modeling
Amit Dubey | Frank Keller | Patrick Sturt
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Priming Effects in Combinatory Categorial Grammar
David Reitter | Julia Hockenmaier | Frank Keller
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

2005

pdf bib
Parallelism in Coordination as an Instance of Syntactic Priming: Evidence from Corpus-based Modeling
Amit Dubey | Patrick Sturt | Frank Keller
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French
Abhishek Arun | Frank Keller
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
The Web as a Baseline: Evaluating the Performance of Unsupervised Web-based Models for a Range of NLP Tasks
Mirella Lapata | Frank Keller
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

pdf bib
Robust models of human parsing
Frank Keller
Proceedings of the 3rd workshop on RObust Methods in Analysis of Natural Language Data (ROMAND 2004)

pdf bib
The Entropy Rate Principle as a Predictor of Processing Effort: An Evaluation against Eye-tracking Data
Frank Keller
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2003

pdf bib
Probabilistic Parsing for German Using Sister-Head Dependencies
Amit Dubey | Frank Keller
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Using the Web to Obtain Frequencies for Unseen Bigrams
Frank Keller | Mirella Lapata
Computational Linguistics, Volume 29, Number 3, September 2003: Special Issue on the Web as Corpus

2002

pdf bib
Using the Web to Overcome Data Sparseness
Frank Keller | Maria Lapata | Olga Ourioupina
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

2001

pdf bib
Evaluating Smoothing Algorithms against Plausibility Judgements
Maria Lapata | Frank Keller | Scott McDonald
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

1999

pdf bib
Determinants of Adjective-Noun Plausibility
Maria Lapata | Scott McDonald | Frank Keller
Ninth Conference of the European Chapter of the Association for Computational Linguistics

1995

pdf bib
Towards an Account of Extraposition in HPSG
Frank Keller
Seventh Conference of the European Chapter of the Association for Computational Linguistics