Jill Burstein


2023

pdf bib
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)
Ekaterina Kochmar | Jill Burstein | Andrea Horbach | Ronja Laarmann-Quante | Nitin Madnani | Anaïs Tack | Victoria Yaneva | Zheng Yuan | Torsten Zesch
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

pdf bib
Automated evaluation of written discourse coherence using GPT-4
Ben Naismith | Phoebe Mulcaire | Jill Burstein
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

The popularization of large language models (LLMs) such as OpenAI’s GPT-3 and GPT-4 have led to numerous innovations in the field of AI in education. With respect to automated writing evaluation (AWE), LLMs have reduced challenges associated with assessing writing quality characteristics that are difficult to identify automatically, such as discourse coherence. In addition, LLMs can provide rationales for their evaluations (ratings) which increases score interpretability and transparency. This paper investigates one approach to producing ratings by training GPT-4 to assess discourse coherence in a manner consistent with expert human raters. The findings of the study suggest that GPT-4 has strong potential to produce discourse coherence ratings that are comparable to human ratings, accompanied by clear rationales. Furthermore, the GPT-4 ratings outperform traditional NLP coherence metrics with respect to agreement with human ratings. These results have implications for advancing AWE technology for learning and assessment.

pdf bib
Rating Short L2 Essays on the CEFR Scale with GPT-4
Kevin P. Yancey | Geoffrey Laflair | Anthony Verardi | Jill Burstein
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

Essay scoring is a critical task used to evaluate second-language (L2) writing proficiency on high-stakes language assessments. While automated scoring approaches are mature and have been around for decades, human scoring is still considered the gold standard, despite its high costs and well-known issues such as human rater fatigue and bias. The recent introduction of large language models (LLMs) brings new opportunities for automated scoring. In this paper, we evaluate how well GPT-3.5 and GPT-4 can rate short essay responses written by L2 English learners on a high-stakes language assessment, computing inter-rater agreement with human ratings. Results show that when calibration examples are provided, GPT-4 can perform almost as well as modern Automatic Writing Evaluation (AWE) methods, but agreement with human ratings can vary depending on the test-taker’s first language (L1).

2022

pdf bib
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)
Ekaterina Kochmar | Jill Burstein | Andrea Horbach | Ronja Laarmann-Quante | Nitin Madnani | Anaïs Tack | Victoria Yaneva | Zheng Yuan | Torsten Zesch
Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022)

2021

pdf bib
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications
Jill Burstein | Andrea Horbach | Ekaterina Kochmar | Ronja Laarmann-Quante | Claudia Leacock | Nitin Madnani | Ildikó Pilán | Helen Yannakoudakis | Torsten Zesch
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

2020

pdf bib
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications
Jill Burstein | Ekaterina Kochmar | Claudia Leacock | Nitin Madnani | Ildikó Pilán | Helen Yannakoudakis | Torsten Zesch
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

2019

pdf bib
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Jill Burstein | Christy Doran | Thamar Solorio
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

2018

pdf bib
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications
Joel Tetreault | Jill Burstein | Ekaterina Kochmar | Claudia Leacock | Helen Yannakoudakis
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Writing Mentor: Self-Regulated Writing Feedback for Struggling Writers
Nitin Madnani | Jill Burstein | Norbert Elliot | Beata Beigman Klebanov | Diane Napolitano | Slava Andreyev | Maxwell Schwartz
Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations

Writing Mentor is a free Google Docs add-on designed to provide feedback to struggling writers and help them improve their writing in a self-paced and self-regulated fashion. Writing Mentor uses natural language processing (NLP) methods and resources to generate feedback in terms of features that research into post-secondary struggling writers has classified as developmental (Burstein et al., 2016b). These features span many writing sub-constructs (use of sources, claims, and evidence; topic development; coherence; and knowledge of English conventions). Prelimi- nary analysis indicates that users have a largely positive impression of Writing Mentor in terms of usability and potential impact on their writing.

2017

pdf bib
Building Better Open-Source Tools to Support Fairness in Automated Scoring
Nitin Madnani | Anastassia Loukina | Alina von Davier | Jill Burstein | Aoife Cahill
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

Automated scoring of written and spoken responses is an NLP application that can significantly impact lives especially when deployed as part of high-stakes tests such as the GRE® and the TOEFL®. Ethical considerations require that automated scoring algorithms treat all test-takers fairly. The educational measurement community has done significant research on fairness in assessments and automated scoring systems must incorporate their recommendations. The best way to do that is by making available automated, non-proprietary tools to NLP researchers that directly incorporate these recommendations and generate the analyses needed to help identify and resolve biases in their scoring systems. In this paper, we attempt to provide such a solution.

pdf bib
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications
Joel Tetreault | Jill Burstein | Claudia Leacock | Helen Yannakoudakis
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Exploring Relationships Between Writing & Broader Outcomes With Automated Writing Evaluation
Jill Burstein | Dan McCaffrey | Beata Beigman Klebanov | Guangming Ling
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Writing is a challenge, especially for at-risk students who may lack the prerequisite writing skills required to persist in U.S. 4-year postsecondary (college) institutions. Educators teaching postsecondary courses requiring writing could benefit from a better understanding of writing achievement and its role in postsecondary success. In this paper, novel exploratory work examined how automated writing evaluation (AWE) can inform our understanding of the relationship between postsecondary writing skill and broader success outcomes. An exploratory study was conducted using test-taker essays from a standardized writing assessment of postsecondary student learning outcomes. Findings showed that for the essays, AWE features were found to be predictors of broader outcomes measures: college success and learning outcomes measures. Study findings illustrate AWE’s potential to support educational analytics – i.e., relationships between writing skill and broader outcomes – taking a step toward moving AWE beyond writing assessment and instructional use cases.

2016

pdf bib
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications
Joel Tetreault | Jill Burstein | Claudia Leacock | Helen Yannakoudakis
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Enhancing STEM Motivation through Personal and Communal Values: NLP for Assessment of Utility Value in Student Writing
Beata Beigman Klebanov | Jill Burstein | Judith Harackiewicz | Stacy Priniski | Matthew Mulholland
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Argumentation: Content, Structure, and Relationship with Essay Quality
Beata Beigman Klebanov | Christian Stab | Jill Burstein | Yi Song | Binod Gyawali | Iryna Gurevych
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib
Language Muse: Automated Linguistic Activity Generation for English Language Learners
Nitin Madnani | Jill Burstein | John Sabatini | Kietha Biggers | Slava Andreyev
Proceedings of ACL-2016 System Demonstrations

2015

pdf bib
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications
Joel Tetreault | Jill Burstein | Claudia Leacock
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Scoring Persuasive Essays Using Opinions and their Targets
Noura Farra | Swapna Somasundaran | Jill Burstein
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

2014

pdf bib
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications
Joel Tetreault | Jill Burstein | Claudia Leacock
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Finding your “Inner-Annotator”: An Experiment in Annotator Independence for Rating Discourse Coherence Quality in Essays
Jill Burstein | Swapna Somasundaran | Martin Chodorow
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

pdf bib
Content Importance Models for Scoring Writing From Sources
Beata Beigman Klebanov | Nitin Madnani | Jill Burstein | Swapna Somasundaran
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Lexical Chaining for Measuring Discourse Coherence Quality in Test-taker Essays
Swapna Somasundaran | Jill Burstein | Martin Chodorow
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
The Far Reach of Multiword Expressions in Educational Technology
Jill Burstein
Proceedings of the 9th Workshop on Multiword Expressions

pdf bib
A User Study: Technology to Increase Teachers’ Linguistic Awareness to Improve Instructional Language Support for English Language Learners
Jill Burstein | John Sabatini | Jane Shore | Brad Moulder | Jennifer Lentini
Proceedings of the Workshop on Natural Language Processing for Improving Textual Accessibility

pdf bib
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications
Joel Tetreault | Jill Burstein | Claudia Leacock
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Automated Scoring of a Summary-Writing Task Designed to Measure Reading Comprehension
Nitin Madnani | Jill Burstein | John Sabatini | Tenaha O’Reilly
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Using Pivot-Based Paraphrasing and Sentiment Profiles to Improve a Subjectivity Lexicon for Essay Data
Beata Beigman Klebanov | Nitin Madnani | Jill Burstein
Transactions of the Association for Computational Linguistics, Volume 1

We demonstrate a method of improving a seed sentiment lexicon developed on essay data by using a pivot-based paraphrasing system for lexical expansion coupled with sentiment profile enrichment using crowdsourcing. Profile enrichment alone yields up to 15% improvement in the accuracy of the seed lexicon on 3-way sentence-level sentiment polarity classification of essay data. Using lexical expansion in addition to sentiment profiles provides a further 7% improvement in performance. Additional experiments show that the proposed method is also effective with other subjectivity lexicons and in a different domain of application (product reviews).

2012

pdf bib
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP
Joel Tetreault | Jill Burstein | Claudia Leacock
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

2011

pdf bib
Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications
Joel Tetreault | Jill Burstein | Claudia Leacock
Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications

2010

pdf bib
Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications
Joel Tetreault | Jill Burstein | Claudia Leacock
Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Ron Kaplan | Jill Burstein | Mary Harper | Gerald Penn
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Using Entity-Based Features to Model Coherence in Student Essays
Jill Burstein | Joel Tetreault | Slava Andreyev
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications
Joel Tetreault | Jill Burstein | Claudia Leacock
Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications

2008

pdf bib
Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications
Joel Tetreault | Jill Burstein | Rachele De Felice
Proceedings of the Third Workshop on Innovative Use of NLP for Building Educational Applications

2007

pdf bib
The Automated Text Adaptation Tool
Jill Burstein | Jane Shore | John Sabatini | Yong-Won Lee | Matthew Ventura
Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT)

2005

pdf bib
Proceedings of the Second Workshop on Building Educational Applications Using NLP
Jill Burstein | Claudia Leacock
Proceedings of the Second Workshop on Building Educational Applications Using NLP

pdf bib
Translation Exercise Assistant: Automated Generation of Translation
Jill Burstein | Daniel Marcu
Proceedings of HLT/EMNLP 2005 Interactive Demonstrations

2004

pdf bib
Evaluating Multiple Aspects of Coherence in Student Essays
Derrick Higgins | Jill Burstein | Daniel Marcu | Claudia Gentile
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

2003

pdf bib
Toward Evaluation of Writing Style: Overly Repetitious Word Use
Jill Burstein | Magdalena Wolska
10th Conference of the European Chapter of the Association for Computational Linguistics

2001

pdf bib
Towards Automatic Classification of Discourse Elements in Essays
Jill Burstein | Daniel Marcu | Slava Andreyev | Martin Chodorow
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib
Benefits of Modularity in an Automated Essay Scoring System
Jill Burstein | Daniel Marcu
Proceedings of the COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems

1999

pdf bib
Automated Essay Scoring for Nonnative English Speakers
Jill Burstein | Martin Chodorow
Computer Mediated Language Assessment and Evaluation in Natural Language Processing

1998

pdf bib
Automated Scoring Using A Hybrid Feature Identification Technique
Jill Burstein | Karen Kukich | Susanne Wolff | Chi Lu | Martin Chodorow | Lisa Braden-Harder | Mary Dee Harris
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
Automated Scoring Using A Hybrid Feature Identification Technique
Jill Burstein | Karen Kukich | Susanne Wolff | Chi Lu | Martin Chodorow | Lisa Braden-Harder | Mary Dee Harris
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Enriching Automated Essay Scoring Using Discourse Marking
Jill Burstein | Karen Kukich | Susanne Wolff | Chi Lu | Martin Chodorow
Discourse Relations and Discourse Markers

1997

pdf bib
An Automatic Scoring System For Advanced Placement Biology Essays
Jill Burstein | Susanne Wolff | Chi Lu | Randy M. Kaplan
Fifth Conference on Applied Natural Language Processing

1996

pdf bib
Using Lexical Semantic Techniques to Classify Free-Responses
Jill Burstein | Randy Kaplan | Susanne Wolff | Chi Lu
Breadth and Depth of Semantic Lexicons