Jeffrey Heer


2021

pdf bib
Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models
Tongshuang Wu | Marco Tulio Ribeiro | Jeffrey Heer | Daniel Weld
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

While counterfactual examples are useful for analysis and training of NLP models, current generation methods either rely on manual labor to create very few counterfactuals, or only instantiate limited types of perturbations such as paraphrases or word substitutions. We present Polyjuice, a general-purpose counterfactual generator that allows for control over perturbation types and locations, trained by finetuning GPT-2 on multiple datasets of paired sentences. We show that Polyjuice produces diverse sets of realistic counterfactuals, which in turn are useful in various distinct applications: improving training and evaluation on three different tasks (with around 70% less annotation effort than manual generation), augmenting state-of-the-art explanation techniques, and supporting systematic counterfactual error analysis by revealing behaviors easily missed by human experts.

2019

pdf bib
Errudite: Scalable, Reproducible, and Testable Error Analysis
Tongshuang Wu | Marco Tulio Ribeiro | Jeffrey Heer | Daniel Weld
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions. This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process. First, error groups should be precisely defined for reproducibility; Errudite supports this with an expressive domain-specific language. Second, to avoid spurious conclusions, a large set of instances should be analyzed, including both positive and negative examples; Errudite enables systematic grouping of relevant instances with filtering queries. Third, hypotheses about the cause of errors should be explicitly tested; Errudite supports this via automated counterfactual rewriting. We validate our approach with a user study, finding that Errudite (1) enables users to perform high quality and reproducible error analyses with less effort, (2) reveals substantial ambiguities in prior published error analyses practices, and (3) enhances the error analysis experience by allowing users to test and revise prior beliefs.

2015

pdf bib
TopicCheck: Interactive Alignment for Assessing Topic Model Stability
Jason Chuang | Margaret E. Roberts | Brandon M. Stewart | Rebecca Weiss | Dustin Tingley | Justin Grimmer | Jeffrey Heer
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
Human Effort and Machine Learnability in Computer Aided Translation
Spence Green | Sida I. Wang | Jason Chuang | Jeffrey Heer | Sebastian Schuster | Christopher D. Manning
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces
Jason Chuang | Spence Green | Marti Hearst | Jeffrey Heer | Philipp Koehn
Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces