Pierre Boullier

This paper reports a large-scale non-probabilistic parsing experiment with a deep LFG parser. We briefly introduce the parser we used, named SXLFG, and the resources that were used together with it. Then we report quantitative results about the parsing of a multi-million word journalistic corpus. We show that we can parse more than 6 million words in less than 12 hours, only 6.7% of all sentences reaching the 1s timeout. This shows that deep large-coverage non-probabilistic parsers can be efficient enough to parse very large corpora in a reasonable amount of time.

pdf bib abs
The Lefff 2 syntactic lexicon for French: architecture, acquisition, use
Benoît Sagot | Lionel Clément | Éric Villemonte de La Clergerie | Pierre Boullier
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper, we introduce a new lexical resource for French which is freely available as the second version of the Lefff (Lexique des formes fléchies du français - Lexicon of French inflected forms). It is a wide-coverage morphosyntactic and syntactic lexicon, whose architecture relies on properties inheritance, which makes it more compact and more easily maintainable and allows to describe lexical entries independantly from the formalisms it is used for. For these two reasons, we define it as a meta-lexicon. We describe its architecture, several automatic or semi-automatic approaches we use to acquire, correct and/or enrich such a lexicon, as well as the way it is used both with an LFG parser and with a TAG parser based on a meta-grammar, so as to build two large-coverage parsers for French. The web site of the Lefff is http://www.lefff.net/.

2005

pdf bib
Efficient and Robust LFG Parsing: SxLFG
Pierre Boullier | Benoît Sagot
Proceedings of the Ninth International Workshop on Parsing Technology

pdf bib abs
Chaînes de traitement syntaxique
Pierre Boullier | Lionel Clément | Benoît Sagot | Éric Villemonte De La Clergerie
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article expose l’ensemble des outils que nous avons mis en oeuvre pour la campagne EASy d’évaluation d’analyse syntaxique. Nous commençons par un aperçu du lexique morphologique et syntaxique utilisé. Puis nous décrivons brièvement les propriétés de notre chaîne de traitement pré-syntaxique qui permet de gérer des corpus tout-venant. Nous présentons alors les deux systèmes d’analyse que nous avons utilisés, un analyseur TAG issu d’une méta-grammaire et un analyseur LFG. Nous comparons ces deux systèmes en indiquant leurs points communs, comme l’utilisation intensive du partage de calcul et des représentations compactes de l’information, mais également leurs différences, au niveau des formalismes, des grammaires et des analyseurs. Nous décrivons ensuite le processus de post-traitement, qui nous a permis d’extraire de nos analyses les informations demandées par la campagne EASy. Nous terminons par une évaluation quantitative de nos architectures.

pdf bib abs
Un analyseur LFG efficace pour le français : SXLFG
Pierre Boullier | Benoît Sagot | Lionel Clément
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans cet article, nous proposons un nouvel analyseur syntaxique, qui repose sur une variante du modèle Lexical-Functional Grammars (Grammaires Lexicales Fonctionnelles) ou LFG. Cet analyseur LFG accepte en entrée un treillis de mots et calcule ses structures fonctionnelles sur une forêt partagée. Nous présentons également les différentes techniques de rattrapage d’erreurs que nous avons mises en oeuvre. Puis nous évaluons cet analyseur sur une grammaire à large couverture du français dans le cadre d’une utilisation à grande échelle sur corpus variés. Nous montrons que cet analyseur est à la fois efficace et robuste.

2004

pdf bib abs
Les Grammaires à Concaténation d’Intervalles (RCG) comme formalisme grammatical pour la linguistique
Benoît Sagot | Pierre Boullier
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Le but de cet article est de montrer pourquoi les Grammaires à Concaténation d’Intervalles (Range Concatenation Grammars, ou RCG) sont un formalisme particulièrement bien adapté à la description du langage naturel. Nous expliquons d’abord que la puissance nécessaire pour décrire le langage naturel est celle de PTIME. Ensuite, parmi les formalismes grammaticaux ayant cette puissance d’expression, nous justifions le choix des RCG. Enfin, après un aperçu de leur définition et de leurs propriétés, nous montrons comment leur utilisation comme grammaires linguistiques permet de traiter des phénomènes syntagmatiques complexes, de réaliser simultanément l’analyse syntaxique et la vérification des diverses contraintes (morphosyntaxiques, sémantique lexicale), et de construire dynamiquement des grammaires linguistiques modulaires.

2003

pdf bib abs
Guided Earley Parsing
Pierre Boullier
Proceedings of the Eighth International Conference on Parsing Technologies

In this paper, we present a method which may speed up Earley parsers in practice. A first pass called a guiding parser builds an intermediate structure called a guide which is used by a second pass, an Earley parser, called a guided parser whose Predictor phase is slightly modified in such a way that it selects an initial item only if this item is in the guide. This approach is validated by practical experiments preformed on a large test set with an English context-free grammar.

pdf bib abs
Supertagging: A Non-Statistical Parsing-Based Approach
Pierre Boullier
Proceedings of the Eighth International Conference on Parsing Technologies

We present a novel approach to supertagging w.r.t. some lexicalized grammar G. It differs from previous approaches in several ways:- These supertaggers rely only on structural information: they do not need any training phase;- These supertaggers do not compute the “best“ supertag for each word, but rather a set of supertags. These sets of supertags do not exclude any supertag that will eventually be used in a valid complete derivation (i.e., we have a recall score of 100%);- These supertaggers are in fact true parsers which accept supersets of L(G) that can be more efficiently parsed than the sentences of L(G).

2001

pdf bib abs
Atelier ATOLL pour les grammaires d’arbres adjoints
François Barthélemy | Pierre Boullier | Philippe Deschamp | Linda Kaouane | Éric Villemonte De La Clergerie
Actes de la 8ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente l’environnement de travail que nous développons au sein de l’équipe ATOLL pour les grammaires d’arbres adjoints. Cet environnement comprend plusieurs outils et ressources fondés sur l’emploi du langage de balisage XML. Ce langage facilite la mise en forme et l’échange de ressources linguistiques.

pdf bib
Guided Parsing of Range Concatenation Languages
François Barthélemy | Pierre Boullier | Philippe Deschamp | Éric Villemonte de la Clergerie
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib abs
Range Concatenation Grammars
Pierre Boullier
Proceedings of the Sixth International Workshop on Parsing Technologies

In this paper we present Range Concatenation Grammars, a syntactic formalism which possesses many attractive features among which we underline here, power and closure properties. For example, Range Concatenation Grammars are more powerful than Linear Context-Free Rewriting Systems though this power is not reached to the detriment of efficiency since its sentences can always be parsed in polynomial time. Range Concatenation Languages are closed both under intersection and complementation and these closure properties may allow to consider novel ways to describe some linguistic processings. We also present a parsing algorithm which is the basis of our current prototype implementation.

1999

pdf bib
Chinese Numbers, MIX, Scrambling, and Range Concatenation Grammars
Pierre Boullier
Ninth Conference of the European Chapter of the Association for Computational Linguistics

1998

pdf bib
A generalization of mildly context-sensitive formalisms
Pierre Boullier
Proceedings of the Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks (TAG+4)

1996

pdf bib
Another Facet of LIG Parsing
Pierre Boullier
34th Annual Meeting of the Association for Computational Linguistics

1995

pdf bib abs
Yet Another 0(n⁶) Recognition Algorithm for Mildly Context-Sensitive Languages
Pierre Boullier
Proceedings of the Fourth International Workshop on Parsing Technologies

Vijay-Shanker and Weir have shown in [17] that Tree Adjoining Grammars and Combinatory Categorial Grammars can be transformed into equivalent Linear Indexed Grammars (LIGs) which can be recognized in 0(n⁶) time using a Cocke-Kasami-Younger style algorithm. This paper exhibits another recognition algorithm for LIGs, with the same upper-bound complexity, but whose average case behaves much better. This algorithm works in two steps: first a general context-free parsing algorithm (using the underlying context-free grammar) builds a shared parse forest, and second, the LIG properties are checked on this forest. This check is based upon the composition of simple relations and does not require any computation of symbol stacks.