Florence Reeder

Also published as: Florence M. Reeder

2010

pdf bib abs
Paralinguist Assessment Decision Factors For Machine Translation Output: A Case Study
Carol Van Ess-Dykema | Jocelyn Phillips | Florence Reeder | Laurie Gerber
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program

We describe a case study that presents a framework for examining whether Machine Translation (MT) output enables translation professionals to translate faster while at the same time producing better quality translations than without MT output. We seek to find decision factors that enable a translation professional, known as a Paralinguist, to determine whether MT output is of sufficient quality to serve as a “seed translation” for post-editors. The decision factors, unlike MT developers’ automatic metrics, must function without a reference translation. We also examine the correlation of MT developers’ automatic metrics with error annotators’ assessments of post-edited translations.

2007

pdf bib
The Chinese Room Experiment: The Self-Organizing Feng Shui of MT
John S. White | Florence Reeder
Proceedings of the Workshop on the Chinese room experiment

2006

This paper describes an effort to investigate the incrementally deepening development of an interlingua notation, validated by human annotation of texts in English plus six languages. We begin with deep syntactic annotation, and in this paper present a series of annotation manuals for six different languages at the deep-syntactic level of representation. Many syntactic differences between languages are removed in the proposed syntactic annotation, making them useful resources for multilingual NLP projects with semantic components.

pdf bib abs
Direct Application of a Language Learner Test to MT Evaluation
Florence Reeder
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper shows the applicability of language testing techniques to machine translation (MT) evaluation through one of a set of related experiments. One straightforward experiment is to use language testing exams and scoring on MT output with little or no adaptation. This paper describes one such experiment, the first in a set. After an initial test (Vanni and Reeder, 2000), we expanded the experiment to include multiple raters and a more detailed analysis of the surprising results. Namely that unlike with humans, MT systems perform more poorly at both level zero and one than at level two and three. This paper presents these results as an illustration of both the applicability of language testing techniques and also the caution that needs to be applied.

pdf bib abs
Measuring MT Adequacy Using Latent Semantic Analysis
Florence Reeder
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

Translation adequacy is defined as the amount of semantic content from the source language document that is conveyed in the target language document. As such, it is more difficult to measure than intelligibility since semantic content must be measured in two documents and then compared. Latent Semantic Analysis is a content measurement technique used in language learner evaluation that exhibits characteristics attractive for re-use in machine translation evaluation (MTE). This experiment, which is a series of applications of the LSA algorithm in various configurations, demonstrates its usefulness as an MTE metric for adequacy. In addition, this experiment lays the groundwork for using LSA as a method to measure the accuracy of a translation without reliance on reference translations.

bib
Expecting the Unexpected: Using MT Operationally
Florence Reeder
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: User Track Presentations

2004

pdf bib abs
Investigation of intelligibility judgments
Florence Reeder
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper describes an intelligibility snap-judgment test. In this exercise, participants are shown a series of human translations and machine translations and are asked to determine whether the author was human or machine. The experiment shows that snap judgments on intelligibility are made successfully and that system rankings on snap judgments are consistent with more detailed intelligibility measures. In addition to demonstrating a quick intelligibility judgment, representing on a few minutes time of each participant, it details the types of errors which led to the snap judgments.

MT systems that use only superficial representations, including the current generation of statistical MT systems, have been successful and useful. However, they will experience a plateau in quality, much like other “silver bullet” approaches to MT. We pursue work on the development of interlingual representations for use in symbolic or hybrid MT systems. In this paper, we describe the creation of an interlingua and the development of a corpus of semantically annotated text, to be validated in six languages and evaluated in several ways. We have established a distributed, well-functioning research methodology, designed a preliminary interlingua notation, created annotation manuals and tools, developed a test collection in six languages with associated English translations, annotated some 150 translations, and designed and applied various annotation metrics. We describe the data sets being annotated and the interlingual (IL) representation language which uses two ontologies and a systematic theta-role list. We present the annotation tools built and outline the annotation process. Following this, we describe our evaluation methodology and conclude with a summary of issues that have arisen.

2003

pdf bib abs
Granularity in MT evaluation
Florence Reeder | John White
Workshop on Systemizing MT Evaluation

This paper looks at granularity issues in machine translation evaluation. We start with work by (White, 2001) who examined the correlation between intelligibility and fidelity at the document level. His work showed that intelligibility and fidelity do not correlate well at the document level. These dissimilarities lead to our investigation of evaluation granularity. In particular, we revisit the intelligibility and fidelity relationship at the corpus level. We expect these to support certain assumptions in both evaluations as well as indicate issues germane to future evaluations.

2001

pdf bib
Is That Your Final Answer?
Florence Reeder
Proceedings of the First International Conference on Human Language Technology Research

pdf bib abs
In one hundred words or less
Florence Reeder
Workshop on MT Evaluation

This paper reports on research which aims to test the efficacy of applying automated evaluation techniques, originally designed for human second language learners, to machine translation (MT) system evaluation. We believe that such evaluation techniques will provide insight into MT evaluation, MT development, the human translation process and the human language learning process. The experiment described here looks only at the intelligibility of MT output. The evaluation technique is derived from a second language acquisition experiment that showed that assessors can differentiate native from non-native language essays in less than 100 words. Particularly illuminating for our purposes is the set of factor on which the assessors made their decisions. We duplicated this experiment to see if similar criteria could be elicited from duplicating the test using both human and machine translation outputs in the decision set. The encouraging results of this experiment, along with an analysis of language factors contributing to the successful outcomes, is presented here.

pdf bib abs
The naming of things and the confusion of tongues: an MT metric
Florence Reeder | Keith Miller | Jennifer Doyon | John White
Workshop on MT Evaluation

This paper reports the results of an experiment in machine translation (MT) evaluation, designed to determine whether easily/rapidly collected metrics can predict the human generated quality parameters of MT output. In this experiment we evaluated a system’s ability to translate named entities, and compared this measure with previous evaluation scores of fidelity and intelligibility. There are two significant benefits potentially associated with a correlation between traditional MT measures and named entity scores: the ability to automate named entity scoring and thus MT scoring; and insights into the linguistic aspects of task-based uses of MT, as captured in previous studies.

2000

pdf bib abs
How are you doing? A look at MT evaluation
Michelle Vanni | Florence Reeder
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers

Machine Translation evaluation has been more magic and opinion than science. The history of MT evaluation is long and checkered - the search for objective, measurable, resource-reduced methods of evaluation continues. A recent trend towards task-based evaluation inspires the question - can we use methods of evaluation of language competence in language learners and apply them reasonably to MT evaluation? This paper is the first in a series of steps to look at this question. In this paper, we will present the theoretical framework for our ideas, the notions we ultimately aim towards and some very preliminary results of a small experiment along these lines.

pdf bib
At Your Service: Embedded MT As a Service
Florence M. Reeder
ANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems