Kim Gerdes


2023

pdf bib
Annotating Discursive Roles of Sentences in Patent Descriptions
Lufei Liu | Xu Sun | François Veltz | Kim Gerdes
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)

Patent descriptions are a crucial component of patent applications, as they are key to understanding the invention and play a significant role in securing patent grants. While discursive analyses have been undertaken for scientific articles, they have not been as thoroughly explored for patent descriptions, despite the increasing importance of Intellectual Property and the constant rise of the number of patent applications. In this study, we propose an annotation scheme containing 16 classes that allows categorizing each sentence in patent descriptions according to their discursive roles. We publish an experimental human-annotated corpus of 16 patent descriptions and analyze challenges that may be encountered in such work. This work can be base for an automated annotation and thus contribute to enriching linguistic resources in the patent domain.

pdf bib
Word order flexibility: a typometric study
Sylvain Kahane | Ziqian Peng | Kim Gerdes
Proceedings of the Seventh International Conference on Dependency Linguistics (Depling, GURT/SyntaxFest 2023)

This paper introduces a typometric measure of flexibility, which quantifies the variability of head-dependent word order on the whole set of treebanks of a language or on specific constructions. The measure is based on the notion of head-initiality and we show that it can be computed for all of languages of the Universal Dependency treebank set, that it does not require ad-hoc thresholds to categorize languages or constructions, and that it can be applied with any granularity of constructions and languages. We compare our results with Bakker’s (1998) categorical flexibility index. Typometric flexibility is shown to be a good measure for characterizing the language distribution with respect to word order for a given construction, and for estimating whether a construction predicts the global word order behavior of a language.

pdf bib
Exploring Data-Centric Strategies for French Patent Classification: A Baseline and Comparisons
You Zuo | Benoît Sagot | Kim Gerdes | Houda Mouzoun | Samir Ghamri Doudane
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : travaux de recherche originaux -- articles longs

This paper proposes a novel approach to French patent classification leveraging data-centric strategies. We compare different approaches for the two deepest levels of the IPC hierarchy: the IPC group and subgroups. Our experiments show that while simple ensemble strategies work for shallower levels, deeper levels require more sophisticated techniques such as data augmentation, clustering, and negative sampling. Our research highlights the importance of language-specific features and data-centric strategies for accurate and reliable French patent classification. It provides valuable insights and solutions for researchers and practitioners in the field of patent classification, advancing research in French patent classification.

pdf bib
Autogramm : développement simultané de treebanks et de grammaires à partir de corpus
Sylvain Kahane | Santiago Herrera | Bruno Guillaume | Kim Gerdes
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 6 : projets

Ce projet de recherche vise à créer de nouveaux treebanks en dépendance pour des langues sous-dotées, en unifiant autant que possible leur développement avec celui de grammaires descriptives quantitatives. Nous présenterons notre chaîne de traitement et de développement de treebanks et nous discuterons du type de grammaire que nous voulons extraire. Enfin, nous examinerons l’utilisation de ces ressources en typologie quantitative.

2021

pdf bib
Annotation guidelines of UD and SUD treebanks for spoken corpora: A proposal
Sylvain Kahane | Bernard Caron | Emmett Strickland | Kim Gerdes
Proceedings of the 20th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2021)

pdf bib
Starting a new treebank? Go SUD!
Kim Gerdes | Bruno Guillaume | Sylvain Kahane | Guy Perrier
Proceedings of the Sixth International Conference on Dependency Linguistics (Depling, SyntaxFest 2021)

2020

pdf bib
When Collaborative Treebank Curation Meets Graph Grammars
Gaël Guibon | Marine Courtin | Kim Gerdes | Bruno Guillaume
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper we present Arborator-Grew, a collaborative annotation tool for treebank development. Arborator-Grew combines the features of two preexisting tools: Arborator and Grew. Arborator is a widely used collaborative graphical online dependency treebank annotation tool. Grew is a tool for graph querying and rewriting specialized in structures needed in NLP, i.e. syntactic and semantic dependency trees and graphs. Grew also has an online version, Grew-match, where all Universal Dependencies treebanks in their classical, deep and surface-syntactic flavors can be queried. Arborator-Grew is a complete redevelopment and modernization of Arborator, replacing its own internal database storage by a new Grew API, which adds a powerful query tool to Arborator’s existing treebank creation and correction features. This includes complex access control for parallel expert and crowd-sourced annotation, tree comparison visualization, and various exercise modes for teaching and training of annotators. Arborator-Grew opens up new paths of collectively creating, updating, maintaining, and curating syntactic treebanks and semantic graph banks.

2019

pdf bib
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)
Kim Gerdes | Sylvain Kahane
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)

pdf bib
A Surface-Syntactic UD Treebank for Naija
Bernard Caron | Marine Courtin | Kim Gerdes | Sylvain Kahane
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

pdf bib
Improving Surface-syntactic Universal Dependencies (SUD): MWEs and deep syntactic features
Kim Gerdes | Bruno Guillaume | Sylvain Kahane | Guy Perrier
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

pdf bib
The relation between dependency distance and frequency
Xinying Chen | Kim Gerdes
Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019)

pdf bib
Rediscovering Greenberg’s Word Order Universals in UD
Kim Gerdes | Sylvain Kahane | Xinying Chen
Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

2018

pdf bib
SUD or Surface-Syntactic Universal Dependencies: An annotation scheme near-isomorphic to UD
Kim Gerdes | Bruno Guillaume | Sylvain Kahane | Guy Perrier
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

This article proposes a surface-syntactic annotation scheme called SUD that is near-isomorphic to the Universal Dependencies (UD) annotation scheme while following distributional criteria for defining the dependency tree structure and the naming of the syntactic functions. Rule-based graph transformation grammars allow for a bi-directional transformation of UD into SUD. The back-and-forth transformation can serve as an error-mining tool to assure the intra-language and inter-language coherence of the UD treebanks.

2017

pdf bib
Classifying Languages by Dependency Structure. Typologies of Delexicalized Universal Dependency Treebanks
Xinying Chen | Kim Gerdes
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib
Quantitative Comparative Syntax on the Cantonese-Mandarin Parallel Dependency Treebank
Tak-sum Wong | Kim Gerdes | Herman Leung | John Lee
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib
Multi-word annotation in syntactic treebanks - Propositions for Universal Dependencies
Sylvain Kahane | Marine Courtin | Kim Gerdes
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories

2016

pdf bib
Dependency Annotation Choices: Assessing Theoretical and Practical Issues of Universal Dependencies
Kim Gerdes | Sylvain Kahane
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf bib
Developing Universal Dependencies for Mandarin Chinese
Herman Leung | Rafaël Poiret | Tak-sum Wong | Xinying Chen | Kim Gerdes | John Lee
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

This article proposes a Universal Dependency Annotation Scheme for Mandarin Chinese, including POS tags and dependency analysis. We identify cases of idiosyncrasy of Mandarin Chinese that are difficult to fit into the current schema which has mainly been based on the descriptions of various Indo-European languages. We discuss differences between our scheme and those of the Stanford Chinese Dependencies and the Chinese Dependency Treebank.

2015

pdf bib
Analyse syntaxique de l’ancien français : quelles propriétés de la langue influent le plus sur la qualité de l’apprentissage ?
Gaël Guibon | Isabelle Tellier | Sophie Prévost | Matthieu Constant | Kim Gerdes
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

L’article présente des résultats d’expériences d’apprentissage automatique pour l’étiquetage morpho-syntaxique et l’analyse syntaxique en dépendance de l’ancien français. Ces expériences ont pour objectif de servir une exploration de corpus pour laquelle le corpus arboré SRCMF sert de données de référence. La nature peu standardisée de la langue qui y est utilisée implique des données d’entraînement hétérogènes et quantitativement limitées. Nous explorons donc diverses stratégies, fondées sur différents critères (variabilité du lexique, forme Vers/Prose des textes, dates des textes), pour constituer des corpus d’entrainement menant aux meilleurs résultats possibles.

pdf bib
Classifying Syntactic Categories in the Chinese Dependency Network
Xinying Chen | Haitao Liu | Kim Gerdes
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf bib
Non-constituent coordination and other coordinative constructions as Dependency Graphs
Kim Gerdes | Sylvain Kahane
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

2014

pdf bib
Rhapsodie: a Prosodic-Syntactic Treebank for Spoken French
Anne Lacheret | Sylvain Kahane | Julie Beliao | Anne Dister | Kim Gerdes | Jean-Philippe Goldman | Nicolas Obin | Paola Pietrandrea | Atanas Tchobanov
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The main objective of the Rhapsodie project (ANR Rhapsodie 07 Corp-030-01) was to define rich, explicit, and reproducible schemes for the annotation of prosody and syntax in different genres (± spontaneous, ± planned, face-to-face interviews vs. broadcast, etc.), in order to study the prosody/syntax/discourse interface in spoken French, and their roles in the segmentation of speech into discourse units (Lacheret, Kahane, & Pietrandrea forthcoming). We here describe the deliverable, a syntactic and prosodic treebank of spoken French, composed of 57 short samples of spoken French (5 minutes long on average, amounting to 3 hours of speech and 33000 words), orthographically and phonetically transcribed. The transcriptions and the annotations are all aligned on the speech signal: phonemes, syllables, words, speakers, overlaps. This resource is freely available at www.projet-rhapsodie.fr. The sound samples (wav/mp3), the acoustic analysis (original F0 curve manually corrected and automatic stylized F0, pitch format), the orthographic transcriptions (txt), the microsyntactic annotations (tabular format), the macrosyntactic annotations (txt, tabular format), the prosodic annotations (xml, textgrid, tabular format), and the metadata (xml and html) can be freely downloaded under the terms of the Creative Commons licence Attribution - Noncommercial - Share Alike 3.0 France. The metadata are encoded in the IMDI-CMFI format and can be parsed on line.

pdf bib
Correcting and Validating Syntactic Dependency in the Spoken French Treebank Rhapsodie
Rachel Bawden | Marie-Amélie Botalla | Kim Gerdes | Sylvain Kahane
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article presents the methods, results, and precision of the syntactic annotation process of the Rhapsodie Treebank of spoken French. The Rhapsodie Treebank is an 33,000 word corpus annotated for prosody and syntax, licensed in its entirety under Creative Commons. The syntactic annotation contains two levels: a macro-syntactic level, containing a segmentation into illocutionary units (including discourse markers, parentheses …) and a micro-syntactic level including dependency relations and various paradigmatic structures, called pile constructions, the latter being particularly frequent and diverse in spoken language. The micro-syntactic annotation process, presented in this paper, includes a semi-automatic preparation of the transcription, the application of a syntactic dependency parser, transcoding of the parsing results to the Rhapsodie annotation scheme, manual correction by multiple annotators followed by a validation process, and finally the application of coherence rules that check common errors. The good inter-annotator agreement scores are presented and analyzed in greater detail. The article also includes the list of functions used in the dependency annotation and for the distinction of various pile constructions and presents the ideas underlying these choices.

2013

pdf bib
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)
Eva Hajičová | Kim Gerdes | Leo Wanner
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

pdf bib
Collaborative Dependency Annotation
Kim Gerdes
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

2012

pdf bib
Intonosyntactic Data Structures: The Rhapsodie Treebank of Spoken French
Kim Gerdes | Sylvain Kahane | Anne Lacheret | Paola Pietandrea | Arthur Truong
Proceedings of the Sixth Linguistic Annotation Workshop

2010

pdf bib
Depends on What the French Say - Spoken Corpus Annotation with and beyond Syntactic Functions
José Deulofeu | Lucie Duffort | Kim Gerdes | Sylvain Kahane | Paola Pietrandrea
Proceedings of the Fourth Linguistic Annotation Workshop

2009

pdf bib
Grammaires d’erreur – correction grammaticale avec analyse profonde et proposition de corrections minimales
Lionel Clément | Kim Gerdes | Renaud Marlet
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Nous présentons un système de correction grammatical ouvert, basé sur des analyses syntaxiques profondes. La spécification grammaticale est une grammaire hors-contexte équipée de structures de traits plates. Après une analyse en forêt partagée où les contraintes d’accord de traits sont relâchées, la détection d’erreur minimise globalement les corrections à effectuer et des phrases alternatives correctes sont automatiquement proposées.

2006

pdf bib
A Polynomial Parsing Algorithm for the Topological Model: Synchronizing Constituent and Dependency Grammars, Illustrated by German Word Order Phenomena
Kim Gerdes | Sylvain Kahane
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2003

pdf bib
La topologie comme interface entre syntaxe et prosodie : un système de génération appliqué au grec moderne
Kim Gerdes | Hi-Yon Yoo
Actes de la 10ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous développons les modules syntaxique et topologique du modèle Sens- Texte et nous montrons l’utilité de la topologie comme représentation intermédiaire entre les représentations syntaxique et phonologique. Le modèle est implémenté dans un générateur et nous présentons la grammaire du grec moderne dans cette approche.

2002

pdf bib
DTAG?
Kim Gerdes
Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+6)

2001

pdf bib
Word Order in German: A Formal Dependency Grammar Using a Topological Hierarchy
Kim Gerdes | Sylvain Kahane
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics