Stephen Tratz


2020

pdf bib
Dialogue-AMR: Abstract Meaning Representation for Dialogue
Claire Bonial | Lucia Donatelli | Mitchell Abrams | Stephanie M. Lukin | Stephen Tratz | Matthew Marge | Ron Artstein | David Traum | Clare Voss
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper describes a schema that enriches Abstract Meaning Representation (AMR) in order to provide a semantic representation for facilitating Natural Language Understanding (NLU) in dialogue systems. AMR offers a valuable level of abstraction of the propositional content of an utterance; however, it does not capture the illocutionary force or speaker’s intended contribution in the broader dialogue context (e.g., make a request or ask a question), nor does it capture tense or aspect. We explore dialogue in the domain of human-robot interaction, where a conversational robot is engaged in search and navigation tasks with a human partner. To address the limitations of standard AMR, we develop an inventory of speech acts suitable for our domain, and present “Dialogue-AMR”, an enhanced AMR that represents not only the content of an utterance, but the illocutionary force behind it, as well as tense and aspect. To showcase the coverage of the schema, we use both manual and automatic methods to construct the “DialAMR” corpus—a corpus of human-robot dialogue annotated with standard AMR and our enriched Dialogue-AMR schema. Our automated methods can be used to incorporate AMR into a larger NLU pipeline supporting human-robot dialogue.

2019

pdf bib
Augmenting Abstract Meaning Representation for Human-Robot Dialogue
Claire Bonial | Lucia Donatelli | Stephanie M. Lukin | Stephen Tratz | Ron Artstein | David Traum | Clare Voss
Proceedings of the First International Workshop on Designing Meaning Representations

We detail refinements made to Abstract Meaning Representation (AMR) that make the representation more suitable for supporting a situated dialogue system, where a human remotely controls a robot for purposes of search and rescue and reconnaissance. We propose 36 augmented AMRs that capture speech acts, tense and aspect, and spatial information. This linguistic information is vital for representing important distinctions, for example whether the robot has moved, is moving, or will move. We evaluate two existing AMR parsers for their performance on dialogue data. We also outline a model for graph-to-graph conversion, in which output from AMR parsers is converted into our refined AMRs. The design scheme presented here, though task-specific, is extendable for broad coverage of speech acts using AMR in future task-independent work.

pdf bib
Dependency Tree Annotation with Mechanical Turk
Stephen Tratz
Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP

Crowdsourcing is frequently employed to quickly and inexpensively obtain valuable linguistic annotations but is rarely used for parsing, likely due to the perceived difficulty of the task and the limited training of the available workers. This paper presents what is, to the best of our knowledge, the first published use of Mechanical Turk (or similar platform) to crowdsource parse trees. We pay Turkers to construct unlabeled dependency trees for 500 English sentences using an interactive graphical dependency tree editor, collecting 10 annotations per sentence. Despite not requiring any training, several of the more prolific workers meet or exceed 90% attachment agreement with the Penn Treebank (PTB) portion of our data, and, furthermore, for 72% of these PTB sentences, at least one Turker produces a perfect parse. Thus, we find that, supported with a simple graphical interface, people with presumably no prior experience can achieve surprisingly high degrees of accuracy on this task. To facilitate research into aggregation techniques for complex crowdsourced annotations, we publicly release our annotated corpus.

2018

pdf bib
Towards a Computational Lexicon for Moroccan Darija: Words, Idioms, and Constructions
Jamal Laoudi | Claire Bonial | Lucia Donatelli | Stephen Tratz | Clare Voss
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)

In this paper, we explore the challenges of building a computational lexicon for Moroccan Darija (MD), an Arabic dialect spoken by over 32 million people worldwide but which only recently has begun appearing frequently in written form in social media. We raise the question of what belongs in such a lexicon and start by describing our work building traditional word-level lexicon entries with their English translations. We then discuss challenges in translating idiomatic MD text that led to creating multi-word expression lexicon entries whose meanings could not be fully derived from the individual words. Finally, we provide a preliminary exploration of constructions to be considered for inclusion in an MD constructicon by translating examples of English constructions and examining their MD counterparts.

pdf bib
A Web-based System for Crowd-in-the-Loop Dependency Treebanking
Stephen Tratz | Nhien Phan
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
EasyTree: A Graphical Tool for Dependency Tree Annotation
Alexa Little | Stephen Tratz
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper introduces EasyTree, a dynamic graphical tool for dependency tree annotation. Built in JavaScript using the popular D3 data visualization library, EasyTree allows annotators to construct and label trees entirely by manipulating graphics, and then export the corresponding data in JSON format. Human users are thus able to annotate in an intuitive way without compromising the machine-compatibility of the output. EasyTree has a number of features to assist annotators, including color-coded part-of-speech indicators and optional translation displays. It can also be customized to suit a wide range of projects; part-of-speech categories, edge labels, and many other settings can be edited from within the GUI. The system also utilizes UTF-8 encoding and properly handles both left-to-right and right-to-left scripts. By providing a user-friendly annotation tool, we aim to reduce time spent transforming data or learning to use the software, to improve the user experience for annotators, and to make annotation approachable even for inexperienced users. Unlike existing solutions, EasyTree is built entirely with standard web technologies–JavaScript, HTML, and CSS–making it ideal for web-based annotation efforts, including crowdsourcing efforts.

2014

pdf bib
Resumptive Pronoun Detection for Modern Standard Arabic to English MT
Stephen Tratz | Clare Voss | Jamal Laoudi
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation (HyTra)

pdf bib
Finding Romanized Arabic Dialect in Code-Mixed Tweets
Clare Voss | Stephen Tratz | Jamal Laoudi | Douglas Briesch
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Recent computational work on Arabic dialect identification has focused primarily on building and annotating corpora written in Arabic script. Arabic dialects however also appear written in Roman script, especially in social media. This paper describes our recent work developing tweet corpora and a token-level classifier that identifies a Romanized Arabic dialect and distinguishes it from French and English in tweets. We focus on Moroccan Darija, one of several spoken vernaculars in the family of Maghrebi Arabic dialects. Even given noisy, code-mixed tweets,the classifier achieved token-level recall of 93.2% on Romanized Arabic dialect, 83.2% on English, and 90.1% on French. The classifier, now integrated into our tweet conversation annotation tool (Tratz et al. 2013), has semi-automated the construction of a Romanized Arabic-dialect lexicon. Two datasets, a full list of Moroccan Darija surface token forms and a table of lexical entries derived from this list with spelling variants, as extracted from our tweet corpus collection, will be made available in the LRE MAP.

2013

pdf bib
Automatic Interpretation of the English Possessive
Stephen Tratz | Eduard Hovy
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Tweet Conversation Annotation Tool with a Focus on an Arabic Dialect, Moroccan Darija
Stephen Tratz | Douglas Briesch | Jamal Laoudi | Clare Voss
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
A Cross-Task Flexible Transition Model for Arabic Tokenization, Affix Detection, Affix Labeling, POS Tagging, and Dependency Parsing
Stephen Tratz
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

2011

pdf bib
Models and Training for Unsupervised Preposition Sense Disambiguation
Dirk Hovy | Ashish Vaswani | Stephen Tratz | David Chiang | Eduard Hovy
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
Stephen Tratz | Eduard Hovy
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
What’s in a Preposition? Dimensions of Sense Disambiguation for an Interesting Word Class
Dirk Hovy | Stephen Tratz | Eduard Hovy
Coling 2010: Posters

pdf bib
A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
Stephen Tratz | Eduard Hovy
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
ISI: Automatic Classification of Relations Between Nominals Using a Maximum Entropy Classifier
Stephen Tratz | Eduard Hovy
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf bib
Disambiguation of Preposition Sense Using Linguistically Motivated Features
Stephen Tratz | Dirk Hovy
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium

2007

pdf bib
PNNL: A Supervised Maximum Entropy Approach to Word Sense Disambiguation
Stephen Tratz | Antonio Sanfilippo | Michelle Gregory | Alan Chappell | Christian Posse | Paul Whitney
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
A High Accuracy Method for Semi-Supervised Information Extraction
Stephen Tratz | Antonio Sanfilippo
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

2006

pdf bib
Integrating Ontological Knowledge and Textual Evidence in Estimating Gene and Gene Product Similarity
Antonio Sanfilippo | Christian Posse | Banu Gopalan | Stephen Tratz | Michelle Gregory
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

pdf bib
Word Domain Disambiguation via Word Sense Disambiguation
Antonio Sanfilippo | Stephen Tratz | Michelle Gregory
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers