Phil Bartie


2023

pdf bib
SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation
Bhathiya Hemanthage | Christian Dondrup | Phil Bartie | Oliver Lemon
Proceedings of the 15th International Conference on Computational Semantics

SimpleMTOD is a simple language model which recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is built on a large-scale transformer-based auto-regressive architecture, which has already proven to be successful in uni-modal task-oriented dialogues, and effectively leverages transfer learning from pretrained GPT-2. In-order to capture the semantics of visual scenes, we introduce both local and de-localized tokens for objects within a scene. De-localized tokens represent the type of an object rather than the specific object itself and so possess a consistent meaning across the dataset. SimpleMTOD achieves a state-of-the-art BLEU score (0.327) in the Response Generation sub-task of the SIMMC 2.0 test-std dataset while performing on par in other multimodal sub-tasks: Disambiguation, Coreference Resolution, and Dialog State Tracking. This is despite taking a minimalist approach for extracting visual (and non-visual) informa- tion. In addition the model does not rely on task-specific architectural changes such as classification heads.

2016

pdf bib
The REAL Corpus: A Crowd-Sourced Corpus of Human Generated and Evaluated Spatial References to Real-World Urban Scenes
Phil Bartie | William Mackaness | Dimitra Gkatzia | Verena Rieser
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Our interest is in people’s capacity to efficiently and effectively describe geographic objects in urban scenes. The broader ambition is to develop spatial models capable of equivalent functionality able to construct such referring expressions. To that end we present a newly crowd-sourced data set of natural language references to objects anchored in complex urban scenes (In short: The REAL Corpus ― Referring Expressions Anchored Language). The REAL corpus contains a collection of images of real-world urban scenes together with verbal descriptions of target objects generated by humans, paired with data on how successful other people were able to identify the same object based on these descriptions. In total, the corpus contains 32 images with on average 27 descriptions per image and 3 verifications for each description. In addition, the corpus is annotated with a variety of linguistically motivated features. The paper highlights issues posed by collecting data using crowd-sourcing with an unrestricted input format, as well as using real-world urban scenes.

2015

pdf bib
From the Virtual to the RealWorld: Referring to Objects in Real-World Spatial Scenes
Dimitra Gkatzia | Verena Rieser | Phil Bartie | William Mackaness
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2013

pdf bib
A Multithreaded Conversational Interface for Pedestrian Navigation and Question Answering
Srinivasan Janarthanam | Oliver Lemon | Xingkun Liu | Phil Bartie | William Mackaness | Tiphaine Dalmas
Proceedings of the SIGDIAL 2013 Conference

pdf bib
Evaluating a City Exploration Dialogue System with Integrated Question-Answering and Pedestrian Navigation
Srinivasan Janarthanam | Oliver Lemon | Phil Bartie | Tiphaine Dalmas | Anna Dickinson | Xingkun Liu | William Mackaness | Bonnie Webber
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Integrating Location, Visibility, and Question-Answering in a Spoken Dialogue System for Pedestrian City Exploration
Srinivasan Janarthanam | Oliver Lemon | Xingkun Liu | Phil Bartie | William Mackaness | Tiphaine Dalmas | Jana Goetze
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue