Burr Settles


2021

pdf bib
Jump-Starting Item Parameters for Adaptive Language Tests
Arya D. McCarthy | Kevin P. Yancey | Geoffrey T. LaFlair | Jesse Egbert | Manqian Liao | Burr Settles
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

A challenge in designing high-stakes language assessments is calibrating the test item difficulties, either a priori or from limited pilot test data. While prior work has addressed ‘cold start’ estimation of item difficulties without piloting, we devise a multi-task generalized linear model with BERT features to jump-start these estimates, rapidly improving their quality with as few as 500 test-takers and a small sample of item exposures (≈6 each) from a large item bank (≈4,000 items). Our joint model provides a principled way to compare test-taker proficiency, item difficulty, and language proficiency frameworks like the Common European Framework of Reference (CEFR). This also enables new item difficulty estimates without piloting them first, which in turn limits item exposure and thus enhances test item security. Finally, using operational data from the Duolingo English Test, a high-stakes English proficiency test, we find that the difficulty estimates derived using this method correlate strongly with lexico-grammatical features that correlate with reading complexity.

2020

pdf bib
Machine Learning–Driven Language Assessment
Burr Settles | Geoffrey T. LaFlair | Masato Hagiwara
Transactions of the Association for Computational Linguistics, Volume 8

We describe a method for rapidly creating language proficiency assessments, and provide experimental evidence that such tests can be valid, reliable, and secure. Our approach is the first to use machine learning and natural language processing to induce proficiency scales based on a given standard, and then use linguistic models to estimate item difficulty directly for computer-adaptive testing. This alleviates the need for expensive pilot testing with human subjects. We used these methods to develop an online proficiency exam called the Duolingo English Test, and demonstrate that its scores align significantly with other high-stakes English assessments. Furthermore, our approach produces test scores that are highly reliable, while generating item banks large enough to satisfy security requirements.

pdf bib
Simultaneous Translation and Paraphrase for Language Education
Stephen Mayhew | Klinton Bicknell | Chris Brust | Bill McDowell | Will Monroe | Burr Settles
Proceedings of the Fourth Workshop on Neural Generation and Translation

We present the task of Simultaneous Translation and Paraphrasing for Language Education (STAPLE). Given a prompt in one language, the goal is to generate a diverse set of correct translations that language learners are likely to produce. This is motivated by the need to create and maintain large, high-quality sets of acceptable translations for exercises in a language-learning application, and synthesizes work spanning machine translation, MT evaluation, automatic paraphrasing, and language education technology. We developed a novel corpus with unique properties for five languages (Hungarian, Japanese, Korean, Portuguese, and Vietnamese), and report on the results of a shared task challenge which attracted 20 teams to solve the task. In our meta-analysis, we focus on three aspects of the resulting systems: external training corpus selection, model architecture and training decisions, and decoding and filtering strategies. We find that strong systems start with a large amount of generic training data, and then fine-tune with in-domain data, sampled according to our provided learner response frequencies.

2018

pdf bib
Second Language Acquisition Modeling
Burr Settles | Chris Brust | Erin Gustafson | Masato Hagiwara | Nitin Madnani
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

We present the task of second language acquisition (SLA) modeling. Given a history of errors made by learners of a second language, the task is to predict errors that they are likely to make at arbitrary points in the future. We describe a large corpus of more than 7M words produced by more than 6k learners of English, Spanish, and French using Duolingo, a popular online language-learning app. Then we report on the results of a shared task challenge aimed studying the SLA task via this corpus, which attracted 15 teams and synthesized work from various fields including cognitive science, linguistics, and machine learning.

2016

pdf bib
A Trainable Spaced Repetition Model for Language Learning
Burr Settles | Brendan Meeder
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Behavioral Factors in Interactive Training of Text Classifiers
Burr Settles | Xiaojin Zhu
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
Burr Settles
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
Burr Settles | Kevin Small | Katrin Tomanek
Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing

pdf bib
Computational Creativity Tools for Songwriters
Burr Settles
Proceedings of the NAACL HLT 2010 Second Workshop on Computational Approaches to Linguistic Creativity

2009

pdf bib
Active Learning by Labeling Features
Gregory Druck | Burr Settles | Andrew McCallum
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
An Analysis of Active Learning Strategies for Sequence Labeling Tasks
Burr Settles | Mark Craven
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2004

pdf bib
Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets
Burr Settles
Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP)