Claire Grover


2020

pdf bib
Geoparsing the historical Gazetteers of Scotland: accurately computing location in mass digitised texts
Rosa Filgueira | Claire Grover | Melissa Terras | Beatrice Alex
Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora

This paper describes work in progress on devising automatic and parallel methods for geoparsing large digital historical textual data by combining the strengths of three natural language processing (NLP) tools, the Edinburgh Geoparser, spaCy and defoe, and employing different tokenisation and named entity recognition (NER) techniques. We apply these tools to a large collection of nineteenth century Scottish geographical dictionaries, and describe preliminary results obtained when processing this data.

pdf bib
Not a cute stroke: Analysis of Rule- and Neural Network-based Information Extraction Systems for Brain Radiology Reports
Andreas Grivas | Beatrice Alex | Claire Grover | Richard Tobin | William Whiteley
Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis

We present an in-depth comparison of three clinical information extraction (IE) systems designed to perform entity recognition and negation detection on brain imaging reports: EdIE-R, a bespoke rule-based system, and two neural network models, EdIE-BiLSTM and EdIE-BERT, both multi-task learning models with a BiLSTM and BERT encoder respectively. We compare our models both on an in-sample and an out-of-sample dataset containing mentions of stroke findings and draw on our error analysis to suggest improvements for effective annotation when building clinical NLP models for a new domain. Our analysis finds that our rule-based system outperforms the neural models on both datasets and seems to generalise to the out-of-sample dataset. On the other hand, the neural models do not generalise negation to the out-of-sample dataset, despite metrics on the in-sample dataset suggesting otherwise.

2018

pdf bib
Up-cycling Data for Natural Language Generation
Amy Isard | Jon Oberlander | Claire Grover
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Homing in on Twitter Users: Evaluating an Enhanced Geoparser for User Profile Locations
Beatrice Alex | Clare Llewellyn | Claire Grover | Jon Oberlander | Richard Tobin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Twitter-related studies often need to geo-locate Tweets or Twitter users, identifying their real-world geographic locations. As tweet-level geotagging remains rare, most prior work exploited tweet content, timezone and network information to inform geolocation, or else relied on off-the-shelf tools to geolocate users from location information in their user profiles. However, such user location metadata is not consistently structured, causing such tools to fail regularly, especially if a string contains multiple locations, or if locations are very fine-grained. We argue that user profile location (UPL) and tweet location need to be treated as distinct types of information from which differing inferences can be drawn. Here, we apply geoparsing to UPLs, and demonstrate how task performance can be improved by adapting our Edinburgh Geoparser, which was originally developed for processing English text. We present a detailed evaluation method and results, including inter-coder agreement. We demonstrate that the optimised geoparser can effectively extract and geo-reference multiple locations at different levels of granularity with an F1-score of around 0.90. We also illustrate how geoparsed UPLs can be exploited for international information trade studies and country-level sentiment analysis.

pdf bib
Improving Topic Model Clustering of Newspaper Comments for Summarisation
Clare Llewellyn | Claire Grover | Jon Oberlander
Proceedings of the ACL 2016 Student Research Workshop

2014

pdf bib
Re-using an Argument Corpus to Aid in the Curation of Social Media Collections
Clare Llewellyn | Claire Grover | Jon Oberlander | Ewan Klein
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This work investigates how automated methods can be used to classify social media text into argumentation types. In particular it is shown how supervised machine learning was used to annotate a Twitter dataset (London Riots) with argumentation classes. An investigation of issues arising from a natural inconsistency within social media data found that machine learning algorithms tend to over fit to the data because Twitter contains a lot of repetition in the form of retweets. It is also noted that when learning argumentation classes we must be aware that the classes will most likely be of very different sizes and this must be kept in mind when analysing the results. Encouraging results were found in adapting a model from one domain of Twitter data (London Riots) to another (OR2012). When adapting a model to another dataset the most useful feature was punctuation. It is probable that the nature of punctuation in Twitter language, the very specific use in links, indicates argumentation class.

pdf bib
A Gazetteer and Georeferencing for Historical English Documents
Claire Grover | Richard Tobin
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

pdf bib
A Web-based Geo-resolution Annotation and Evaluation Tool
Beatrice Alex | Kate Byrne | Claire Grover | Richard Tobin
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

2010

pdf bib
Labelling and Spatio-Temporal Grounding of News Events
Bea Alex | Claire Grover
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media

pdf bib
Agile Corpus Annotation in Practice: An Overview of Manual and Automatic Annotation of CVs
Bea Alex | Claire Grover | Rongzhou Shen | Mijail Kabadjov
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
Space characters in Chinese semi-structured texts
Rongzhou Shen | Claire Grover | Ewan Klein
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Edinburgh-LTG: TempEval-2 System Description
Claire Grover | Richard Tobin | Beatrice Alex | Kate Byrne
Proceedings of the 5th International Workshop on Semantic Evaluation

2008

pdf bib
Learning the Species of Biomedical Named Entities from Annotated Corpora
Xinglong Wang | Claire Grover
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In biomedical articles, terms with the same surface forms are often used to refer to different entities across a number of model organisms, in which case determining the species becomes crucial to term identification systems that ground terms to specific database identifiers. This paper describes a rule-based system that extracts “species indicating words”, such as human or murine, which can be used to decide the species of the nearby entity terms, and a machine-learning species disambiguation system that was developed on manually species-annotated corpora. Performance of both systems were evaluated on gold-standard datasets, where the machine-learning system yielded better overall results.

pdf bib
Named Entity Recognition for Digitised Historical Texts
Claire Grover | Sharon Givon | Richard Tobin | Julian Ball
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We describe and evaluate a prototype system for recognising person and place names in digitised records of British parliamentary proceedings from the late 17th and early 19th centuries. The output of an OCR engine is the input for our system and we describe certain issues and errors in this data and discuss the methods we have used to overcome the problems. We describe our rule-based named entity recognition system for person and place names which is implemented using the LT-XML2 and LT-TTT2 text processing tools. We discuss the annotation of a development and testing corpus and provide results of an evaluation of our system on the test corpus.

2007

pdf bib
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).
Caroline Sporleder | Antal van den Bosch | Claire Grover
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

pdf bib
Recognising Nested Named Entities in Biomedical Text
Beatrice Alex | Barry Haddow | Claire Grover
Biological, translational, and clinical language processing

2006

pdf bib
The Impact of Annotation on the Performance of Protein Tagging in Biomedical Text
Beatrice Alex | Malvina Nissim | Claire Grover
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we discuss five different corpora annotated forprotein names. We present several within- and cross-dataset proteintagging experiments showing that different annotation schemes severelyaffect the portability of statistical protein taggers. By means of adetailed error analysis we identify crucial annotation issues thatfuture annotation projects should take into careful consideration.

pdf bib
Rule-Based Chunking and Reusability
Claire Grover | Richard Tobin
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we discuss a rule-based approach to chunking implemented using the LT-XML2 and LT-TTT2 tools. We describe the tools and the pipeline and grammars that have been developed for the task of chunking. We show that our rule-based approach is easy to adapt to different chunking styles and that the mark-up of further linguistic information such as nominal and verbal heads can be added to the rules at little extra cost. We evaluate our chunker against the CoNLL 2000 data and discuss discrepancies between our output and the CoNLL mark-up as well as discrepancies within the CoNLL data itself. We contrast our results with the higher scores obtained using machine learning and argue that the portability and flexibility of our approach still make it a more practical solution.

pdf bib
Tools to Address the Interdependence between Tokenisation and Standoff Annotation
Claire Grover | Michael Matthews | Richard Tobin
Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing

2004

pdf bib
A Rhetorical Status Classifier for Legal Text Summarisation
Ben Hachey | Claire Grover
Text Summarization Branches Out

pdf bib
The HOLJ Corpus. Supporting Summarisation of Legal Texts
Claire Grover | Ben Hachey | Ian Hughson
Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora

2003

pdf bib
Summarising Legal Texts: Sentential Tense and Argumentative Roles
Claire Grover | Ben Hachey | Chris Korycinski
Proceedings of the HLT-NAACL 03 Text Summarization Workshop

pdf bib
Automatic Multi-Layer Corpus Annotation for Evaluation Question Answering Methods: CBC4Kids
Jochen L. Leidner | Tiphaine Dalmas | Bonnie Webber | Johan Bos | Claire Grover
Proceedings of 4th International Workshop on Linguistically Interpreted Corpora (LINC-03) at EACL 2003

pdf bib
Demonstration of the CROSSMARC System
Vangelis Karkaletsis | Constantine D. Spyropoulos | Dimitris Souflis | Claire Grover | Ben Hachey | Maria Teresa Pazienza | Michele Vindigni | Emmanuel Cartier | Jose Coch
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations

2002

pdf bib
Multilingual XML-Based Named Entity Recognition for E-Retail Domains
Claire Grover | Scott McDonald | Donnla Nic Gearailt | Vangelis Karkaletsis | Dimitra Farmakiotou | Georgios Samaritakis | Georgios Petasis | Maria Teresa Pazienza | Michele Vindigni | Frantz Vichot | Francis Wolinski
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
XML-based NLP Tools for Analysing and Annotating Medical Language
Claire Grover | Ewan Klein | Mirella Lapata | Alex Lascarides
COLING-02: The 2nd Workshop on NLP and XML (NLPXML-2002)

2001

pdf bib
XML-Based Data Preparation for Robust Deep Parsing
Claire Grover | Alex Lascarides
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib
LT TTT - A Flexible Tokenisation Tool
Claire Grover | Colin Matheson | Andrei Mikheev | Marc Moens
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1999

pdf bib
Named Entity Recognition without Gazetteers
Andrei Mikheev | Marc Moens | Claire Grover
Ninth Conference of the European Chapter of the Association for Computational Linguistics

1998

pdf bib
Description of the LTG System Used for MUC-7
Andrei Mikheev | Claire Grover | Marc Moens
Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998

1995

pdf bib
Algorithms for Analysing the Temporal Structure of Discourse
Janet Hitzeman | Marc Moens | Claire Grover
Seventh Conference of the European Chapter of the Association for Computational Linguistics

1994

pdf bib
Priority Union and Generalization in Discourse Grammars
Claire Grover | Chris Brew | Suresh Manandhar | Marc Moens
32nd Annual Meeting of the Association for Computational Linguistics

1989

pdf bib
The Syntactic Regularity of English Noun Phrases
Lita Taylor | Claire Grover | Ted Briscoe
Fourth Conference of the European Chapter of the Association for Computational Linguistics

1988

pdf bib
Software Support for Practical Grammar Development
Bran Boguraev | John Carroll | Ted Briscoe | Claire Grover
Coling Budapest 1988 Volume 1: International Conference on Computational Linguistics

1987

pdf bib
The Derivation of a Grammatically Indexed Lexicon from the Longman Dictionary of Contemporary English
Bran Boguraev | Ted Briscoe | John Carroll | David Carter | Claire Grover
25th Annual Meeting of the Association for Computational Linguistics