Marti A. Hearst

Also published as: Marti Hearst


2023

pdf bib
PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents
Kyle Lo | Zejiang Shen | Benjamin Newman | Joseph Chang | Russell Authur | Erin Bransom | Stefan Candra | Yoganand Chandrasekhar | Regan Huff | Bailey Kuehl | Amanpreet Singh | Chris Wilhelm | Angele Zamarron | Marti A. Hearst | Daniel Weld | Doug Downey | Luca Soldaini
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Despite growing interest in applying natural language processing (NLP) and computer vision (CV) models to the scholarly domain, scientific documents remain challenging to work with. They’re often in difficult-to-use PDF formats, and the ecosystem of models to process them is fragmented and incomplete. We introduce PaperMage, an open-source Python toolkit for analyzing and processing visually-rich, structured scientific documents. PaperMage offers clean and intuitive abstractions for seamlessly representing and manipulating both textual and visual document elements. PaperMage achieves this by integrating disparate state-of-the-art NLP and CV models into a unified framework, and provides turn-key recipes for common scientific document processing use-cases. PaperMage has powered multiple research prototypes of AI applications over scientific documents, along with Semantic Scholar’s large-scale production system for processing millions of PDFs. GitHub: https://github.com/allenai/papermage

2022

pdf bib
Semantic Diversity in Dialogue with Natural Language Inference
Katherine Stasaski | Marti Hearst
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Generating diverse, interesting responses to chitchat conversations is a problem for neural conversational agents. This paper makes two substantial contributions to improving diversity in dialogue generation. First, we propose a novel metric which uses Natural Language Inference (NLI) to measure the semantic diversity of a set of model responses for a conversation. We evaluate this metric using an established framework (Tevet and Berant, 2021) and find strong evidence indicating NLI Diversity is correlated with semantic diversity. Specifically, we show that the contradiction relation is more useful than the neutral relation for measuring this diversity and that incorporating the NLI model’s confidence achieves state-of-the-art results. Second, we demonstrate how to iteratively improve the semantic diversity of a sampled set of responses via a new generation procedure called Diversity Threshold Generation, which results in an average 137% increase in NLI Diversity compared to standard generation procedures.

pdf bib
SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization
Philippe Laban | Tobias Schnabel | Paul N. Bennett | Marti A. Hearst
Transactions of the Association for Computational Linguistics, Volume 10

In the summarization domain, a key requirement for summaries is to be factually consistent with the input document. Previous work has found that natural language inference (NLI) models do not perform competitively when applied to inconsistency detection. In this work, we revisit the use of NLI for inconsistency detection, finding that past work suffered from a mismatch in input granularity between NLI datasets (sentence-level), and inconsistency detection (document level). We provide a highly effective and light-weight method called SummaCConv that enables NLI models to be successfully used for this task by segmenting documents into sentence units and aggregating scores between pairs of sentences. We furthermore introduce a new benchmark called SummaC (Summary Consistency) which consists of six large inconsistency detection datasets. On this dataset, SummaCConv obtains state-of-the-art results with a balanced accuracy of 74.4%, a 5% improvement compared with prior work.

2021

pdf bib
News Headline Grouping as a Challenging NLU Task
Philippe Laban | Lucas Bandarkar | Marti A. Hearst
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent progress in Natural Language Understanding (NLU) has seen the latest models outperform human performance on many standard tasks. These impressive results have led the community to introspect on dataset limitations, and iterate on more nuanced challenges. In this paper, we introduce the task of HeadLine Grouping (HLG) and a corresponding dataset (HLGD) consisting of 20,056 pairs of news headlines, each labeled with a binary judgement as to whether the pair belongs within the same group. On HLGD, human annotators achieve high performance of around 0.9 F-1, while current state-of-the art Transformer models only reach 0.75 F-1, opening the path for further improvements. We further propose a novel unsupervised Headline Generator Swap model for the task of HeadLine Grouping that achieves within 3 F-1 of the best supervised model. Finally, we analyze high-performing models with consistency tests, and find that models are not consistent in their predictions, revealing modeling limits of current architectures.

pdf bib
Modeling Mathematical Notation Semantics in Academic Papers
Hwiyeol Jo | Dongyeop Kang | Andrew Head | Marti A. Hearst
Findings of the Association for Computational Linguistics: EMNLP 2021

Natural language models often fall short when understanding and generating mathematical notation. What is not clear is whether these shortcomings are due to fundamental limitations of the models, or the absence of appropriate tasks. In this paper, we explore the extent to which natural language models can learn semantics between mathematical notation and their surrounding text. We propose two notation prediction tasks, and train a model that selectively masks notation tokens and encodes left and/or right sentences as context. Compared to baseline models trained by masked language modeling, our method achieved significantly better performance at the two tasks, showing that this approach is a good first step towards modeling mathematical texts. However, the current models rarely predict unseen symbols correctly, and token-level predictions are more accurate than symbol-level predictions, indicating more work is needed to represent structural patterns. Based on the results, we suggest future works toward modeling mathematical texts.

pdf bib
Automatically Generating Cause-and-Effect Questions from Passages
Katherine Stasaski | Manav Rathod | Tony Tu | Yunfang Xiao | Marti A. Hearst
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

Automated question generation has the potential to greatly aid in education applications, such as online study aids to check understanding of readings. The state-of-the-art in neural question generation has advanced greatly, due in part to the availability of large datasets of question-answer pairs. However, the questions generated are often surface-level and not challenging for a human to answer. To develop more challenging questions, we propose the novel task of cause-and-effect question generation. We build a pipeline that extracts causal relations from passages of input text, and feeds these as input to a state-of-the-art neural question generator. The extractor is based on prior work that classifies causal relations by linguistic category (Cao et al., 2016; Altenberg, 1984). This work results in a new, publicly available collection of cause-and-effect questions. We evaluate via both automatic and manual metrics and find performance improves for both question generation and question answering when we utilize a small auxiliary data source of cause-and-effect questions for fine-tuning. Our approach can be easily applied to generate cause-and-effect questions from other text collections and educational material, allowing for adaptable large-scale generation of cause-and-effect questions.

pdf bib
Keep It Simple: Unsupervised Simplification of Multi-Paragraph Text
Philippe Laban | Tobias Schnabel | Paul Bennett | Marti A. Hearst
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

This work presents Keep it Simple (KiS), a new approach to unsupervised text simplification which learns to balance a reward across three properties: fluency, salience and simplicity. We train the model with a novel algorithm to optimize the reward (k-SCST), in which the model proposes several candidate simplifications, computes each candidate’s reward, and encourages candidates that outperform the mean reward. Finally, we propose a realistic text comprehension task as an evaluation method for text simplification. When tested on the English news domain, the KiS model outperforms strong supervised baselines by more than 4 SARI points, and can help people complete a comprehension task an average of 18% faster while retaining accuracy, when compared to the original text.

pdf bib
Can Transformer Models Measure Coherence In Text: Re-Thinking the Shuffle Test
Philippe Laban | Luke Dai | Lucas Bandarkar | Marti A. Hearst
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

The Shuffle Test is the most common task to evaluate whether NLP models can measure coherence in text. Most recent work uses direct supervision on the task; we show that by simply finetuning a RoBERTa model, we can achieve a near perfect accuracy of 97.8%, a state-of-the-art. We argue that this outstanding performance is unlikely to lead to a good model of text coherence, and suggest that the Shuffle Test should be approached in a Zero-Shot setting: models should be evaluated without being trained on the task itself. We evaluate common models in this setting, such as Generative and Bi-directional Transformers, and find that larger architectures achieve high-performance out-of-the-box. Finally, we suggest the k-Block Shuffle Test, a modification of the original by increasing the size of blocks shuffled. Even though human reader performance remains high (around 95% accuracy), model performance drops from 94% to 78% as block size increases, creating a conceptually simple challenge to benchmark NLP models.

2020

pdf bib
SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search
Tom Hope | Jason Portenoy | Kishore Vasan | Jonathan Borchardt | Eric Horvitz | Daniel Weld | Marti Hearst | Jevin West
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

The COVID-19 pandemic has sparked unprecedented mobilization of scientists, generating a deluge of papers that makes it hard for researchers to keep track and explore new directions. Search engines are designed for targeted queries, not for discovery of connections across a corpus. In this paper, we present SciSight, a system for exploratory search of COVID-19 research integrating two key capabilities: first, exploring associations between biomedical facets automatically extracted from papers (e.g., genes, drugs, diseases, patient outcomes); second, combining textual and network information to search and visualize groups of researchers and their ties. SciSight has so far served over 15K users with over 42K page views and 13% returns.

pdf bib
More Diverse Dialogue Datasets via Diversity-Informed Data Collection
Katherine Stasaski | Grace Hui Yang | Marti A. Hearst
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Automated generation of conversational dialogue using modern neural architectures has made notable advances. However, these models are known to have a drawback of often producing uninteresting, predictable responses; this is known as the diversity problem. We introduce a new strategy to address this problem, called Diversity-Informed Data Collection. Unlike prior approaches, which modify model architectures to solve the problem, this method uses dynamically computed corpus-level statistics to determine which conversational participants to collect data from. Diversity-Informed Data Collection produces significantly more diverse data than baseline data collection methods, and better results on two downstream tasks: emotion classification and dialogue generation. This method is generalizable and can be used with other corpus-level metrics.

pdf bib
The Summary Loop: Learning to Write Abstractive Summaries Without Examples
Philippe Laban | Andrew Hsi | John Canny | Marti A. Hearst
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This work presents a new approach to unsupervised abstractive summarization based on maximizing a combination of coverage and fluency for a given length constraint. It introduces a novel method that encourages the inclusion of key terms from the original document into the summary: key terms are masked out of the original document and must be filled in by a coverage model using the current generated summary. A novel unsupervised training procedure leverages this coverage model along with a fluency model to generate and score summaries. When tested on popular news summarization datasets, the method outperforms previous unsupervised methods by more than 2 R-1 points, and approaches results of competitive supervised methods. Our model attains higher levels of abstraction with copied passages roughly two times shorter than prior work, and learns to compress and merge sentences without supervision.

pdf bib
What’s The Latest? A Question-driven News Chatbot
Philippe Laban | John Canny | Marti A. Hearst
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

This work describes an automatic news chatbot that draws content from a diverse set of news articles and creates conversations with a user about the news. Key components of the system include the automatic organization of news articles into topical chatrooms, integration of automatically generated questions into the conversation, and a novel method for choosing which questions to present which avoids repetitive suggestions. We describe the algorithmic framework and present the results of a usability study that shows that news readers using the system successfully engage in multi-turn conversations about specific news stories.

pdf bib
Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions
Dongyeop Kang | Andrew Head | Risham Sidhu | Kyle Lo | Daniel Weld | Marti A. Hearst
Proceedings of the First Workshop on Scholarly Document Processing

The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. Despite prior work on definition detection, current approaches are far from being accurate enough to use in realworld applications. In this paper, we first perform in-depth error analysis of the current best performing definition detection system and discover major causes of errors. Based on this analysis, we develop a new definition detection system, HEDDEx, that utilizes syntactic features, transformer encoders, and heuristic filters, and evaluate it on a standard sentence-level benchmark. Because current benchmarks evaluate randomly sampled sentences, we propose an alternative evaluation that assesses every sentence within a document. This allows for evaluating recall in addition to precision. HEDDEx outperforms the leading system on both the sentence-level and the document-level tasks, by 12.7 F1 points and 14.4 F1 points, respectively. We note that performance on the high-recall document-level task is much lower than in the standard evaluation approach, due to the necessity of incorporation of document structure as features. We discuss remaining challenges in document-level definition detection, ideas for improvements, and potential issues for the development of reading aid applications.

pdf bib
CIMA: A Large Open Access Dialogue Dataset for Tutoring
Katherine Stasaski | Kimberly Kao | Marti A. Hearst
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

One-to-one tutoring is often an effective means to help students learn, and recent experiments with neural conversation systems are promising. However, large open datasets of tutoring conversations are lacking. To remedy this, we propose a novel asynchronous method for collecting tutoring dialogue via crowdworkers that is both amenable to the needs of deep learning algorithms and reflective of pedagogical concerns. In this approach, extended conversations are obtained between crowdworkers role-playing as both students and tutors. The CIMA collection, which we make publicly available, is novel in that students are exposed to overlapping grounded concepts between exercises and multiple relevant tutoring responses are collected for the same input. CIMA contains several compelling properties from an educational perspective: student role-players complete exercises in fewer turns during the course of the conversation and tutor players adopt strategies that conform with some educational conversational norms, such as providing hints versus asking questions in appropriate contexts. The dataset enables a model to be trained to generate the next tutoring utterance in a conversation, conditioned on a provided action strategy.

2019

pdf bib
Towards augmenting crisis counselor training by improving message retrieval
Orianna Demasi | Marti A. Hearst | Benjamin Recht
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

A fundamental challenge when training counselors is presenting novices with the opportunity to practice counseling distressed individuals without exacerbating a situation. Rather than replacing human empathy with an automated counselor, we propose simulating an individual in crisis so that human counselors in training can practice crisis counseling in a low-risk environment. Towards this end, we collect a dataset of suicide prevention counselor role-play transcripts and make initial steps towards constructing a CRISISbot for humans to counsel while in training. In this data-constrained setting, we evaluate the potential for message retrieval to construct a coherent chat agent in light of recent advances with text embedding methods. Our results show that embeddings can considerably improve retrieval approaches to make them competitive with generative models. By coherently retrieving messages, we can help counselors practice chatting in a low-risk environment.

2017

pdf bib
newsLens: building and visualizing long-ranging news stories
Philippe Laban | Marti Hearst
Proceedings of the Events and Stories in the News Workshop

We propose a method to aggregate and organize a large, multi-source dataset of news articles into a collection of major stories, and automatically name and visualize these stories in a working system. The approach is able to run online, as new articles are added, processing 4 million news articles from 20 news sources, and extracting 80000 major stories, some of which span several years. The visual interface consists of lanes of timelines, each annotated with information that is deemed important for the story, including extracted quotations. The working system allows a user to search and navigate 8 years of story information.

pdf bib
Multiple Choice Question Generation Utilizing An Ontology
Katherine Stasaski | Marti A. Hearst
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Ontologies provide a structured representation of concepts and the relationships which connect them. This work investigates how a pre-existing educational Biology ontology can be used to generate useful practice questions for students by using the connectivity structure in a novel way. It also introduces a novel way to generate multiple-choice distractors from the ontology, and compares this to a baseline of using embedding representations of nodes. An assessment by an experienced science teacher shows a significant advantage over a baseline when using the ontology for distractor generation. A subsequent study with three science teachers on the results of a modified question generation algorithm finds significant improvements. An in-depth analysis of the teachers’ comments yields useful insights for any researcher working on automated question generation for educational applications.

2016

pdf bib
Intersecting Word Vectors to Take Figurative Language to New Heights
Andrea Gagliano | Emily Paul | Kyle Booten | Marti A. Hearst
Proceedings of the Fifth Workshop on Computational Linguistics for Literature

pdf bib
Augmenting Course Material with Open Access Textbooks
Smitha Milli | Marti A. Hearst
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Patterns of Wisdom: Discourse-Level Style in Multi-Sentence Quotations
Kyle Booten | Marti A. Hearst
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Can Natural Language Processing Become Natural Language Coaching?
Marti A. Hearst
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces
Jason Chuang | Spence Green | Marti Hearst | Jeffrey Heer | Philipp Koehn
Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces

pdf bib
Improving the Recognizability of Syntactic Relations Using Contextualized Examples
Aditi Muralidharan | Marti A. Hearst
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2009

pdf bib
NLP Support for Faceted Navigation in Scholarly Collection
Marti A. Hearst | Emilia Stoica
Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries (NLPIR4DL)

2008

pdf bib
Solving Relational Similarity Problems Using the Web as a Corpus
Preslav Nakov | Marti A. Hearst
Proceedings of ACL-08: HLT

pdf bib
Improving Search Results Quality by Customizing Summary Lengths
Michael Kaisser | Marti A. Hearst | John B. Lowe
Proceedings of ACL-08: HLT

2007

pdf bib
UCB: System Description for SemEval Task #4
Preslav Nakov | Marti Hearst
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
UCB System Description for the WMT 2007 Shared Task
Preslav Nakov | Marti Hearst
Proceedings of the Second Workshop on Statistical Machine Translation

pdf bib
Exploring the Efficacy of Caption Search for Bioscience Journal Search Interfaces
Marti Hearst | Anna Divoli | Ye Jerry | Michael Wooldridge
Biological, translational, and clinical language processing

pdf bib
Multiple Alignment of Citation Sentences with Conditional Random Fields and Posterior Decoding
Ariel Schwartz | Anna Divoli | Marti Hearst
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Automating Creation of Hierarchical Faceted Metadata Structures
Emilia Stoica | Marti Hearst | Megan Richardson
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts
Marti Hearst | Gina-Anne Levow | James Allan
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts

2006

pdf bib
Summarizing Key Concepts using Citation Sentences
Ariel S. Schwartz | Marti Hearst
Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

2005

pdf bib
Supporting Annotation Layers for Natural Language Processing
Preslav Nakov | Ariel Schwartz | Brian Wolf | Marti Hearst
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf bib
Multi-way Relation Classification: Application to Protein-Protein Interactions
Barbara Rosario | Marti Hearst
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution
Preslav Nakov | Marti Hearst
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf bib
Teaching Applied Natural Language Processing: Triumphs and Tribulations
Marti Hearst
Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching NLP and CL

pdf bib
Search Engine Statistics Beyond the n-Gram: Application to Noun Compound Bracketing
Preslav Nakov | Marti Hearst
Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005)

2004

pdf bib
Classifying Semantic Relations in Bioscience Texts
Barbara Rosario | Marti Hearst
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Nearly-Automated Metadata Hierarchy Creation
Emilia Stoica | Marti A. Hearst
Proceedings of HLT-NAACL 2004: Short Papers

2003

pdf bib
Category-based Pseudowords
Preslav I. Nakov | Marti A. Hearst
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

2002

pdf bib
The Descent of Hierarchy, and Selection in Relational Semantics
Barbara Rosario | Marti Hearst | Charles Fillmore
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib
A Critique and Improvement of an Evaluation Metric for Text Segmentation
Lev Pevzner | Marti A. Hearst
Computational Linguistics, Volume 28, Number 1, March 2002

2001

pdf bib
Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy
Barbara Rosario | Marti Hearst
Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing

1999

pdf bib
Untangling Text Data Mining
Marti A. Hearst
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1997

pdf bib
Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages
Marti A. Hearst
Computational Linguistics, Volume 23, Number 1, March 1997

pdf bib
Adaptive Multilingual Sentence Boundary Disambiguation
David D. Palmer | Marti A. Hearst
Computational Linguistics, Volume 23, Number 2, June 1997

1994

pdf bib
Multi-Paragraph Segmentation Expository Text
Marti A. Hearst
32nd Annual Meeting of the Association for Computational Linguistics

pdf bib
Adaptive Sentence Boundary Disambiguation
David D. Palmer | Marti A. Hearst
Fourth Conference on Applied Natural Language Processing

1993

pdf bib
Customizing a Lexicon to Better Suit a Computational Task
Marti Hearst | Hinrich Schuetze
Acquisition of Lexical Knowledge from Text

pdf bib
Structural Ambiguity and Conceptual Relations
Philip Resnik | Marti A. Hearst
Very Large Corpora: Academic and Industrial Perspectives

1992

pdf bib
Automatic Acquisition of Hyponyms from Large Text Corpora
Marti A. Hearst
COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics