John Niekrasz


2022

pdf bib
Accelerating Human Authorship of Information Extraction Rules
Dayne Freitag | John Cadigan | John Niekrasz | Robert Sasseen
Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning

We consider whether machine models can facilitate the human development of rule sets for information extraction. Arguing that rule-based methods possess a speed advantage in the early development of new extraction capabilities, we ask whether this advantage can be increased further through the machine facilitation of common recurring manual operations in the creation of an extraction rule set from scratch. Using a historical rule set, we reconstruct and describe the putative manual operations required to create it. In experiments targeting one key operation—the enumeration of words occurring in particular contexts—we simulate the process or corpus review and word list creation, showing that several simple interventions greatly improve recall as a function of simulated labor.

pdf bib
SynKB: Semantic Search for Synthetic Procedures
Fan Bai | Alan Ritter | Peter Madrid | Dayne Freitag | John Niekrasz
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

In this paper we present SynKB, an open-source, automatically extracted knowledge base of chemical synthesis protocols. Similar to proprietary chemistry databases such as Reaxsys, SynKB allows chemists to retrieve structured knowledge about synthetic procedures. By taking advantage of recent advances in natural language processing for procedural texts, SynKB supports more flexible queries about reaction conditions, and thus has the potential to help chemists search the literature for conditions used in relevant reactions as they design new synthetic routes. Using customized Transformer models to automatically extract information from 6 million synthesis procedures described in U.S. and EU patents, we show that for many queries, SynKB has higher recall than Reaxsys, while maintaining high precision. We plan to make SynKB available as an open-source tool; in contrast, proprietary chemistry databases require costly subscriptions.

2016

pdf bib
An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text
Eric Yeh | John Niekrasz | Dayne Freitag | Richard Rohwer
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists. Such regions often encode information according to ad hoc schemas and avail themselves of visual cues in place of natural language grammar, presenting problems for standard information extraction algorithms. Unlike previous work in table extraction, which assumes a relatively noiseless two-dimensional layout, our aim is to accommodate a wide variety of naturally occurring structure types. Our approach has three main parts. First, we collect and annotate a a diverse sample of “naturally” occurring structures from several sources. Second, we use probabilistic text segmentation techniques, featurized by skip bigrams over spatial and token category cues, to automatically identify contiguous regions of structured text that share a common schema. Finally, we identify the records and fields within each structured region using a combination of distributional similarity and sequence alignment methods, guided by minimal supervision in the form of a single annotated record. We evaluate the last two components individually, and conclude with a discussion of further work.

pdf bib
Feature Derivation for Exploitation of Distant Annotation via Pattern Induction against Dependency Parses
Dayne Freitag | John Niekrasz
Proceedings of the 15th Workshop on Biomedical Natural Language Processing

2010

pdf bib
Annotating Participant Reference in English Spoken Conversation
John Niekrasz | Johanna D. Moore
Proceedings of the Fourth Linguistic Annotation Workshop

2009

pdf bib
Participant Subjectivity and Involvement as a Basis for Discourse Segmentation
John Niekrasz | Johanna Moore
Proceedings of the SIGDIAL 2009 Conference

2007

pdf bib
Detecting and Summarizing Action Items in Multi-Party Dialogue
Matthew Purver | John Dowding | John Niekrasz | Patrick Ehlen | Sharareh Noorbaloochi | Stanley Peters
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

pdf bib
Resolving “You” in Multi-Party Dialog
Surabhi Gupta | John Niekrasz | Matthew Purver | Dan Jurafsky
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue

2006

pdf bib
NOMOS: A Semantic Web Software Framework for Annotation of Multimodal Corpora
John Niekrasz | Alexander Gruenstein
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We present NOMOS, an open-source software framework for annotation, processing, and analysis of multimodal corpora. NOMOS is designed for use by annotators, corpus developers, and corpus consumers, emphasizing configurability for a variety of specific annotation tasks. Its features include synchronized multi-channel audio and video playback, compatibility with several corpora, platform independence, and mixed display of capabilities and a well-defined method for layering datasets. Second, we describe how the system is used. For corpus development and annotation we present a typical use scenario involving the creation of a schema and specialization of the user interface. For processing and analysis we describe the GUI- and Java-based methods available, including a GUI for query construction and execution, and an automatically generated schema-conforming Java API for processing of annotations. Additionally, we present some specific annotation and research tasks for which NOMOS has been specialized and used, annotation and research tasks for which NOMOS has been specialized and used, including topic segmentation and decision-point annotation of meetings.

pdf bib
Shallow Discourse Structure for Action Item Detection
Matthew Purver | Patrick Ehlen | John Niekrasz
Proceedings of the Analyzing Conversations in Text and Speech

2005

pdf bib
Meeting Structure Annotation: Data and Tools
Alexander Gruenstein | John Niekrasz | Matthew Purver
Proceedings of the 6th SIGdial Workshop on Discourse and Dialogue

2004

pdf bib
Multi-Human Dialogue Understanding for Assisting Artifact-Producing Meetings
John Niekrasz | Alexander Gruenstein | Lawrence Cavedon
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics