Christiane Fellbaum


2023

pdf bib
What to Make of make? Sense Distinctions for Light Verbs
Julie Kallini | Christiane Fellbaum
Proceedings of the 12th Global Wordnet Conference

Verbs like make, have and get present challenges for applications requiring automatic word sense discrimination. These verbs are both highly frequent and polysemous, with semantically “full” readings, as in make dinner, and “light” readings, as in make a request. Lexical resources like WordNet encode dozens of senses, making discrimination difficult and inviting proposals for reducing the number of entries or grouping them into coarser-grained supersenses. We propose a data-driven, linguistically-based approach to establishing a motivated sense inventory, focusing on make to establish a proof of concept. From several large, syntactically annotated corpora, we extract nouns that are complements of the verb make, and group them into clusters based on their Word2Vec semantic vectors. We manually inspect, for each cluster, the words with vectors closest to the centroid as well as a random sample of words within the cluster. The results show that the clusters reflect an intuitively plausible sense discrimination of make. As an evaluation, we test whether words within a given cluster cooccur in coordination phrases, such as apples and oranges, as prior work has shown that such conjoined nouns are semantically related. Conversely, noun complements from different clusters are less likely to be conjoined. Thus, coordination provides a similarity metric independent of the contextual embeddings used for clustering. Our results pave the way for a WordNet sense inventory that, while not inconsistent with the present one, would reduce it significantly and hold promise for improved automatic word sense discrimination.

2022

pdf bib
MABEL: Attenuating Gender Bias using Textual Entailment Data
Jacqueline He | Mengzhou Xia | Christiane Fellbaum | Danqi Chen
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Pre-trained language models encode undesirable social biases, which are further exacerbated in downstream use. To this end, we propose MABEL (a Method for Attenuating Gender Bias using Entailment Labels), an intermediate pre-training approach for mitigating gender bias in contextualized representations. Key to our approach is the use of a contrastive learning objective on counterfactually augmented, gender-balanced entailment pairs from natural language inference (NLI) datasets. We also introduce an alignment regularizer that pulls identical entailment pairs along opposite gender directions closer. We extensively evaluate our approach on intrinsic and extrinsic metrics, and show that MABEL outperforms previous task-agnostic debiasing approaches in terms of fairness. It also preserves task performance after fine-tuning on downstream tasks. Together, these findings demonstrate the suitability of NLI data as an effective means of bias mitigation, as opposed to only using unlabeled sentences in the literature. Finally, we identify that existing approaches often use evaluation settings that are insufficient or inconsistent. We make an effort to reproduce and compare previous methods, and call for unifying the evaluation settings across gender debiasing methods for better future comparison.

2021

pdf bib
A Corpus-based Syntactic Analysis of Two-termed Unlike Coordination
Julie Kallini | Christiane Fellbaum
Findings of the Association for Computational Linguistics: EMNLP 2021

Coordination is a phenomenon of language that conjoins two or more terms or phrases using a coordinating conjunction. Although coordination has been explored extensively in the linguistics literature, the rules and constraints that govern its structure are still largely elusive and widely debated amongst linguists. This paper presents a study of two-termed unlike coordinations in particular, where the two conjuncts of the coordination phrase form valid constituents but have distinct categories. We conducted a syntactic analysis of the phrasal categories that can be conjoined in such unlike coordinations through a computational corpus-based approach, utilizing the Corpus of Contemporary American English (COCA) as the main data source, as well as the Penn Treebank (PTB). The results show that the two conjuncts within unlike coordinations display different properties based on their position, supporting an antisymmetric view of the structure of coordination. This research provides new data and perspectives through the use of statistical techniques that can help shape future theories and models of coordination.

pdf bib
Proceedings of the 11th Global Wordnet Conference
Piek Vossen | Christiane Fellbaum
Proceedings of the 11th Global Wordnet Conference

pdf bib
Implementing ASLNet V1.0: Progress and Plans
Colin Lualdi | Elaine Wright | Jack Hudson | Naomi Caselli | Christiane Fellbaum
Proceedings of the 11th Global Wordnet Conference

We report on the development of ASLNet, a wordnet for American Sign Language (ASL). ASLNet V1.0 is currently under construction by mapping easy-to-translate ASL lexical nouns to Princeton WordNet synsets. We describe our data model and mapping approach, which can be extended to any sign language. Analysis of the 390 synsets processed to date indicates the success of our procedure yet also highlights the need to supplement our mapping with the “merge” method. We outline our plans for upcoming work to remedy this, which include use of ASL free-association data.

2020

pdf bib
Interdependencies of Gender and Race in Contextualized Word Embeddings
May Jiang | Christiane Fellbaum
Proceedings of the Second Workshop on Gender Bias in Natural Language Processing

Recent years have seen a surge in research on the biases in word embeddings with respect to gender and, to a lesser extent, race. Few of these studies, however, have given attention to the critical intersection of race and gender. In this case study, we analyze the dimensions of gender and race in contextualized word embeddings of given names, taken from BERT, and investigate the nature and nuance of their interaction. We find that these demographic axes, though typically treated as physically and conceptually separate, are in fact interdependent and thus inadvisable to consider in isolation. Further, we show that demographic dimensions predicated on default settings in language, such as in pronouns, may risk rendering groups with multiple marginalized identities invisible. We conclude by discussing the importance and implications of intersectionality for future studies on bias and debiasing in NLP.

2019

pdf bib
Proceedings of the 10th Global Wordnet Conference
Piek Vossen | Christiane Fellbaum
Proceedings of the 10th Global Wordnet Conference

pdf bib
English WordNet 2019 – An Open-Source WordNet for English
John P. McCrae | Alexandre Rademaker | Francis Bond | Ewa Rudnicka | Christiane Fellbaum
Proceedings of the 10th Global Wordnet Conference

We describe the release of a new wordnet for English based on the Princeton WordNet, but now developed under an open-source model. In particular, this version of WordNet, which we call English WordNet 2019, which has been developed by multiple people around the world through GitHub, fixes many errors in previous wordnets for English. We give some details of the changes that have been made in this version and give some perspectives about likely future changes that will be made as this project continues to evolve.

pdf bib
Building ASLNet, a Wordnet for American Sign Language
Colin Lualdi | Jack Hudson | Christiane Fellbaum | Noah Buchholz
Proceedings of the 10th Global Wordnet Conference

We discuss the creation of ASLNet by aligning the Princeton WordNet (PWN) with SignStudy, an online database of American Sign Language (ASL) signs. This alignment will have many immediate benefits for first and second-sign language learners as well as ASL researchers by highlighting semantic relations among signs. We begin to address the interesting theoretical question of to what extent the wordnet-style organization of the English lexicon (and those of wordnets in other spoken languages) is applicable to ASL, and whether ASL requires positing additional, language or modality-specific relations among signs. Significantly, the mapping of SignStudy and PWN provides a bridge between ASL and the worldwide wordnet community, which comprises speakers of dozens of languages working in academic and language technology settings.

2018

pdf bib
Proceedings of the 9th Global Wordnet Conference
Francis Bond | Piek Vossen | Christiane Fellbaum
Proceedings of the 9th Global Wordnet Conference

pdf bib
Linking WordNet to 3D Shapes
Angel X Chang | Rishi Mago | Pranav Krishna | Manolis Savva | Christiane Fellbaum
Proceedings of the 9th Global Wordnet Conference

We describe a project to link the Princeton WordNet to 3D representations of real objects and scenes. The goal is to establish a dataset that helps us to understand how people categorize everyday common objects via their parts, attributes, and context. This paper describes the annotation and data collection effort so far as well as ideas for future work.

2017

pdf bib
Automated WordNet Construction Using Word Embeddings
Mikhail Khodak | Andrej Risteski | Christiane Fellbaum | Sanjeev Arora
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications

We present a fully unsupervised method for automated construction of WordNets based upon recent advances in distributional representations of sentences and word-senses combined with readily available machine translation tools. The approach requires very few linguistic resources and is thus extensible to multiple target languages. To evaluate our method we construct two 600-word testsets for word-to-synset matching in French and Russian using native speakers and evaluate the performance of our method along with several other recent approaches. Our method exceeds the best language-specific and multi-lingual automated WordNets in F-score for both languages. The databases we construct for French and Russian, both languages without large publicly available manually constructed WordNets, will be publicly released along with the testsets.

2016

pdf bib
Encoding Adjective Scales for Fine-grained Resources
Cédric Lopez | Frédérique Segond | Christiane Fellbaum
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We propose an automatic approach towards determining the relative location of adjectives on a common scale based on their strength. We focus on adjectives expressing different degrees of goodness occurring in French product (perfumes) reviews. Using morphosyntactic patterns, we extract from the reviews short phrases consisting of a noun that encodes a particular aspect of the perfume and an adjective modifying that noun. We then associate each such n-gram with the corresponding product aspect and its related star rating. Next, based on the star scores, we generate adjective scales reflecting the relative strength of specific adjectives associated with a shared attribute of the product. An automatic ordering of the adjectives “correct” (correct), “sympa” (nice), “bon” (good) and “excellent” (excellent) according to their score in our resource is consistent with an intuitive scale based on human judgments. Our long-term objective is to generate different adjective scales in an empirical manner, which could allow the enrichment of lexical resources.

pdf bib
Proceedings of the 8th Global WordNet Conference (GWC)
Christiane Fellbaum | Piek Vossen | Verginica Barbu Mititelu | Corina Forascu
Proceedings of the 8th Global WordNet Conference (GWC)

pdf bib
CILI: the Collaborative Interlingual Index
Francis Bond | Piek Vossen | John McCrae | Christiane Fellbaum
Proceedings of the 8th Global WordNet Conference (GWC)

This paper introduces the motivation for and design of the Collaborative InterLingual Index (CILI). It is designed to make possible coordination between multiple loosely coupled wordnet projects. The structure of the CILI is based on the Interlingual index first proposed in the EuroWordNet project with several pragmatic extensions: an explicit open license, definitions in English and links to wordnets in the Global Wordnet Grid.

pdf bib
An Analysis of WordNet’s Coverage of Gender Identity Using Twitter and The National Transgender Discrimination Survey
Amanda Hicks | Michael Rutherford | Christiane Fellbaum | Jiang Bian
Proceedings of the 8th Global WordNet Conference (GWC)

While gender identities in the Western world are typically regarded as binary, our previous work (Hicks et al., 2015) shows that there is more lexical variety of gender identity and the way people identify their gender. There is also a growing need to lexically represent this variety of gender identities. In our previous work, we developed a set of tools and approaches for analyzing Twitter data as a basis for generating hypotheses on language used to identify gender and discuss gender-related issues across geographic regions and population groups in the U.S.A. In this paper we analyze the coverage and relative frequency of the word forms in our Twitter analysis with respect to the National Transgender Discrimination Survey data set, one of the most comprehensive data sets on transgender, gender non-conforming, and gender variant people in the U.S.A. We then analyze the coverage of WordNet, a widely used lexical database, with respect to these identities and discuss some key considerations and next steps for adding gender identity words and their meanings to WordNet.

pdf bib
Tuning Hierarchies in Princeton WordNet
Ahti Lohk | Christiane Fellbaum | Leo Vohandu
Proceedings of the 8th Global WordNet Conference (GWC)

Many new wordnets in the world are constantly created and most take the original Princeton WordNet (PWN) as their starting point. This arguably central position imposes a responsibility on PWN to ensure that its structure is clean and consistent. To validate PWN hierarchical structures we propose the application of a system of test patterns. In this paper, we report on how to validate the PWN hierarchies using the system of test patterns. In sum, test patterns provide lexicographers with a very powerful tool, which we hope will be adopted by the global wordnet community.

2014

bib
Proceedings of the Seventh Global Wordnet Conference
Heili Orav | Christiane Fellbaum | Piek Vossen
Proceedings of the Seventh Global Wordnet Conference

pdf bib
Towards Building Lexical Ontology via Cross-Language Matching
Mamoun Abu Helou | Matteo Palmonari | Mustafa Jarrar | Christiane Fellbaum
Proceedings of the Seventh Global Wordnet Conference

pdf bib
The Role of Adverbs in Sentiment Analysis
Eduard Dragut | Christiane Fellbaum
Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014)

pdf bib
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing
Jorge Baptista | Pushpak Bhattacharyya | Christiane Fellbaum | Mikel Forcada | Chu-Ren Huang | Svetla Koeva | Cvetana Krstev | Eric Laporte
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing

2013

pdf bib
Obituary: George A. Miller
Christiane Fellbaum
Computational Linguistics, Volume 39, Issue 1 - March 2013

2012

pdf bib
The MASC Word Sense Corpus
Rebecca J. Passonneau | Collin F. Baker | Christiane Fellbaum | Nancy Ide
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The MASC project has produced a multi-genre corpus with multiple layers of linguistic annotation, together with a sentence corpus containing WordNet 3.1 sense tags for 1000 occurrences of each of 100 words produced by multiple annotators, accompanied by indepth inter-annotator agreement data. Here we give an overview of the contents of MASC and then focus on the word sense sentence corpus, describing the characteristics that differentiate it from other word sense corpora and detailing the inter-annotator agreement studies that have been performed on the annotations. Finally, we discuss the potential to grow the word sense sentence corpus through crowdsourcing and the plan to enhance the content and annotations of MASC through a community-based collaborative effort.

pdf bib
Empirical Comparisons of MASC Word Sense Annotations
Gerard de Melo | Collin F. Baker | Nancy Ide | Rebecca J. Passonneau | Christiane Fellbaum
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We analyze how different conceptions of lexical semantics affect sense annotations and how multiple sense inventories can be compared empirically, based on annotated text. Our study focuses on the MASC project, where data has been annotated using WordNet sense identifiers on the one hand, and FrameNet lexical units on the other. This allows us to compare the sense inventories of these lexical resources empirically rather than just theoretically, based on their glosses, leading to new insights. In particular, we compute contingency matrices and develop a novel measure, the Expected Jaccard Index, that quantifies the agreement between annotations of the same data based on two different resources even when they have different sets of categories.

2010

pdf bib
Lexical Resources for Noun Compounds in Czech, English and Zulu
Karel Pala | Christiane Fellbaum | Sonja Bosch
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we discuss noun compounding, a highly generative, productive process, in three distinct languages: Czech, English and Zulu. Derivational morphology presents a large grey area between regular, compositional and idiosyncratic, non-compositional word forms. The structural properties of compounds in each of the languages are reviewed and contrasted. Whereas English compounds are head-final and thus left-branching, Czech and Zulu compounds usually consist of a leftmost governing head and a rightmost dependent element. Semantic properties of compounds are discussed with special reference to semantic relations between compound members which cross-linguistically show universal patterns, but idiosyncratic, language specific compounds are also identified. The integration of compounds into lexical resources, and WordNets in particular, remains a challenge that needs to be considered in terms of the compounds’ syntactic idiosyncrasy and semantic compositionality. Experiments with processing compounds in Czech, English and Zulu are reported and partly evaluated. The obtained partial lists of the Czech, English and Zulu compounds are also described.

pdf bib
A Multimodal Vocabulary for Augmentative and Alternative Communication from Sound/Image Label Datasets
Xiaojuan Ma | Christiane Fellbaum | Perry Cook
Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies

pdf bib
The Manually Annotated Sub-Corpus: A Community Resource for and by the People
Nancy Ide | Collin Baker | Christiane Fellbaum | Rebecca Passonneau
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
SemEval-2010 Task 17: All-Words Word Sense Disambiguation on a Specific Domain
Eneko Agirre | Oier Lopez de Lacalle | Christiane Fellbaum | Shu-Kai Hsieh | Maurizio Tesconi | Monica Monachini | Piek Vossen | Roxanne Segers
Proceedings of the 5th International Workshop on Semantic Evaluation

2009

pdf bib
SemEval-2010 Task 17: All-words Word Sense Disambiguation on a Specific Domain
Eneko Agirre | Oier Lopez de Lacalle | Christiane Fellbaum | Andrea Marchetti | Antonio Toral | Piek Vossen
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)

pdf bib
WordNet and FrameNet as Complementary Resources for Annotation
Collin F. Baker | Christiane Fellbaum
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

2008

pdf bib
MASC: the Manually Annotated Sub-Corpus of American English
Nancy Ide | Collin Baker | Christiane Fellbaum | Charles Fillmore | Rebecca Passonneau
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

To answer the critical need for sharable, reusable annotated resources with rich linguistic annotations, we are developing a Manually Annotated Sub-Corpus (MASC) including texts from diverse genres and manual annotations or manually-validated annotations for multiple levels, including WordNet senses and FrameNet frames and frame elements, both of which have become significant resources in the international computational linguistics community. To derive maximal benefit from the semantic information provided by these resources, the MASC will also include manually-validated shallow parses and named entities, which will enable linking WordNet senses and FrameNet frames within the same sentences into more complex semantic structures and, because named entities will often be the role fillers of FrameNet frames, enrich the semantic and pragmatic information derivable from the sub-corpus. All MASC annotations will be published with detailed inter-annotator agreement measures. The MASC and its annotations will be freely downloadable from the ANC website, thus providing maximum accessibility for researchers from around the globe.

pdf bib
KYOTO: a System for Mining, Structuring and Distributing Knowledge across Languages and Cultures
Piek Vossen | Eneko Agirre | Nicoletta Calzolari | Christiane Fellbaum | Shu-kai Hsieh | Chu-Ren Huang | Hitoshi Isahara | Kyoko Kanzaki | Andrea Marchetti | Monica Monachini | Federico Neri | Remo Raffaelli | German Rigau | Maurizio Tescon | Joop VanGent
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We outline work performed within the framework of a current EC project. The goal is to construct a language-independent information system for a specific domain (environment/ecology/biodiversity) anchored in a language-independent ontology that is linked to wordnets in seven languages. For each language, information extraction and identification of lexicalized concepts with ontological entries is carried out by text miners (“Kybots”). The mapping of language-specific lexemes to the ontology allows for crosslinguistic identification and translation of equivalent terms. The infrastructure developed within this project enables long-range knowledge sharing and transfer across many languages and cultures, addressing the need for global and uniform transition of knowledge beyond the specific domains addressed here.

pdf bib
Augmenting WordNet for Deep Understanding of Text
Peter Clark | Christiane Fellbaum | Jerry R. Hobbs | Phil Harrison | William R. Murray | John Thompson
Semantics in Text Processing. STEP 2008 Conference Proceedings

2007

pdf bib
Report on the NSF-sponsored Human Language Technology Workshop on Industrial Centers
Mary Harper | Alex Acero | Srinivas Bangalore | Jaime Carbonell | Jordan Cohen | Barbara Cuthill | Carol Espy-Wilson | Christiane Fellbaum | John Garofolo | Chin-Hui Lee | Jim Lester | Andrew McCallum | Nelson Morgan | Michael Picheney | Joe Picone | Lance Ramshaw | Jeff Reynar | Hadar Shemtov | Clare Voss
Proceedings of Machine Translation Summit XI: Papers

pdf bib
On the Role of Lexical and World Knowledge in RTE3
Peter Clark | Phil Harrison | John Thompson | William Murray | Jerry Hobbs | Christiane Fellbaum
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

pdf bib
SemEval-2007 Task 18: Arabic Semantic Labeling
Mona Diab | Musa Alkhalifa | Sabry ElKateb | Christiane Fellbaum | Aous Mansouri | Martha Palmer
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

pdf bib
Arabic WordNet and the Challenges of Arabic
Sabri Elkateb | William Black | Piek Vossen | David Farwell | Horacio Rodríguez | Adam Pease | Musa Alkhalifa | Christiane Fellbaum
Proceedings of the International Conference on the Challenge of Arabic for NLP/MT

Arabic WordNet is a lexical resource for Modern Standard Arabic based on the widely used Princeton WordNet for English (Fellbaum, 1998). Arabic WordNet (AWN) is based on the design and contents of the universally accepted Princeton WordNet (PWN) and will be mappable straightforwardly onto PWN 2.0 and EuroWordNet (EWN), enabling translation on the lexical level to English and dozens of other languages. We have developed and linked the AWN with the Suggested Upper Merged Ontology (SUMO), where concepts are defined with machine interpretable semantics in first order logic (Niles and Pease, 2001). We have greatly extended the ontology and its set of mappings to provide formal terms and definitions for each synset. The end product would be a linguistic resource with a deep formal semantic foundation that is able to capture the richness of Arabic as described in Elkateb (2005). Tools we have developed as part of this effort include a lexicographer's interface modeled on that used for EuroWordNet, with added facilities for Arabic script, following Black and Elkateb's earlier work (2004). In this paper we describe our methodology for building a lexical resource in Arabic and the challenge of Arabic for lexical resources.

pdf bib
Building a WordNet for Arabic
Sabri Elkateb | William Black | Horacio Rodríguez | Musa Alkhalifa | Piek Vossen | Adam Pease | Christiane Fellbaum
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper introduces a recently initiated project that focuses on building a lexical resource for Modern Standard Arabic based on the widely used Princeton WordNet for English (Fellbaum, 1998). Our aim is to develop a linguistic resource with a deep formal semantic foundation in order to capture the richness of Arabic as described in Elkateb (2005). Arabic WordNet is being constructed following methods developed for EuroWordNet (Vossen, 1998). In addition to the standard wordnet representation of senses, word meanings are also being defined with a machine understandable semantics in first order logic. The basis for this semantics is the Suggested Upper Merged Ontology and its associated domain ontologies (Niles and Pease, 2001). We will greatly extend the ontology and its set of mappings to provide formal terms and definitions for each synset. Tools to be developed as part of this effort include a lexicographer's interface modeled on that used for EuroWordNet, with added facilities for Arabic script, following Black and Elkateb's earlier work (2004).

2004

pdf bib
Medical WordNet: A New Methodology for the Construction and Validation of Information Resources for Consumer Health
Barry Smith | Christiane Fellbaum
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
A Corpus-based Lexical Resource of German Idioms
Gerald Neumann | Christiane Fellbaum | Alexander Geyken | Axel Herold | Christiane Hümmer | Fabian Körner | Undine Kramer | Kerstin Krell | Alexey Sokirko | Diana Stantcheva | Ekatherini Stathi
Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries

2002

pdf bib
From Resources to Applications. Designing the Multilingual ISLE Lexical Entry
Sue Atkins | Nuria Bel | Francesca Bertagna | Pierrette Bouillon | Nicoletta Calzolari | Christiane Fellbaum | Ralph Grishman | Alessandro Lenci | Catherine MacLeod | Martha Palmer | Gregor Thurmair | Marta Villegas | Antonio Zampolli
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
English Tasks: All-Words and Verb Lexical Sample
Martha Palmer | Christiane Fellbaum | Scott Cotton | Lauren Delfs | Hoa Trang Dang
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

1998

pdf bib
Towards a Representation of Idioms in WordNet
Christiane Fellbaum
Usage of WordNet in Natural Language Processing Systems

1997

pdf bib
Analysis of a Hand-Tagging Task
Christiane Fellbaum | Joachim Grabowski | Shari Land
Tagging Text with Lexical Semantics: Why, What, and How?

Search
Co-authors