Andrew Salway


2020

pdf bib
Expert Concept-Modeling Ground Truth Construction for Word Embeddings Evaluation in Concept-Focused Domains
Arianna Betti | Martin Reynaert | Thijs Ossenkoppele | Yvette Oortwijn | Andrew Salway | Jelke Bloem
Proceedings of the 28th International Conference on Computational Linguistics

We present a novel, domain expert-controlled, replicable procedure for the construction of concept-modeling ground truths with the aim of evaluating the application of word embeddings. In particular, our method is designed to evaluate the application of word and paragraph embeddings in concept-focused textual domains, where a generic ontology does not provide enough information. We illustrate the procedure, and validate it by describing the construction of an expert ground truth, QuiNE-GT. QuiNE-GT is built to answer research questions concerning the concept of naturalized epistemology in QUINE, a 2-million-token, single-author, 20th-century English philosophy corpus of outstanding quality, cleaned up and enriched for the purpose. To the best of our ken, expert concept-modeling ground truths are extremely rare in current literature, nor has the theoretical methodology behind their construction ever been explicitly conceptualised and properly systematised. Expert-controlled concept-modeling ground truths are however essential to allow proper evaluation of word embeddings techniques, and increase their trustworthiness in specialised domains in which the detection of concepts through their expression in texts is important. We highlight challenges, requirements, and prospects for future work.

2017

pdf bib
Quote Extraction and Attribution from Norwegian Newspapers
Andrew Salway | Paul Meurer | Knut Hofland | Øystein Reigem
Proceedings of the 21st Nordic Conference on Computational Linguistics

2016

pdf bib
Topically-focused Blog Corpora for Multiple Languages
Andrew Salway | Dag Elgesem | Knut Hofland | Øystein Reigem | Lubos Steskal
Proceedings of the 10th Web as Corpus Workshop

2014

pdf bib
Constructions: a New Unit of Analysis for Corpus-based Discourse Analysis
Samia Touileb | Andrew Salway
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

pdf bib
Inducing Information Structures for Data-driven Text Analysis
Andrew Salway | Samia Touileb | Endre Tvinnereim
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

pdf bib
Applying Grammar Induction to Text Mining
Andrew Salway | Samia Touileb
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)