Beate Dorow

2010

pdf bib abs
Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure
Lukas Michelbacher | Florian Laws | Beate Dorow | Ulrich Heid | Hinrich Schütze
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The Internet is an ever growing source of information stored in documents of different languages. Hence, cross-lingual resources are needed for more and more NLP applications. This paper presents (i) a graph-based method for creating one such resource and (ii) a resource created using the method, a cross-lingual relatedness thesaurus. Given a word in one language, the thesaurus suggests words in a second language that are semantically related. The method requires two monolingual corpora and a basic dictionary. Our general approach is to build two monolingual word graphs, with nodes representing words and edges representing linguistic relations between words. A bilingual dictionary containing basic vocabulary provides seed translations relating nodes from both graphs. We then use an inter-graph node-similarity algorithm to discover related words. Evaluation with three human judges revealed that 49% of the English and 57% of the German words discovered by our method are semantically related to the target words. We publish two resources in conjunction with this paper. First, noun coordinations extracted from the German and English Wikipedias. Second, the cross-lingual relatedness thesaurus which can be used in experiments involving interactive cross-lingual query expansion.

2009

pdf bib
A Graph-Theoretic Algorithm for Automatic Extension of Translation Lexicons
Beate Dorow | Florian Laws | Lukas Michelbacher | Christian Scheible | Jason Utt
Proceedings of the Workshop on Geometrical Models of Natural Language Semantics

2006

pdf bib abs
Ongoing Developments in Automatically Adapting Lexical Resources to the Biomedical Domain
Dominic Widdows | Adil Toumouh | Beate Dorow | Ahmed Lehireche
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes a range of experiments using empirical methods to adapt theWordNet noun ontology for specific use in the biomedical domain. Our basic technique is to extract relationships between terms using the Ohsumed corpus, a large collection of abstracts from PubMed, and to compare the relationships extracted with those that would be expected for medical terms, given the structure of the WordNet ontology. The linguistic methods involve the use of a variety of lexicosyntactic patterns that enable us to extract pairs of coordinate noun terms, and also related groups of adjectives and nouns, using Markov clustering. This enables us in many cases to analyse ambiguous words and select the correct meaning for the biomedical domain. While results are often encouraging, the paper also highlights evident problems and drawbacks with the method, and outlines suggestions for future work.

Beate Dorow

2010

2009

2006

2005

2003

2002

Co-authors

Venues