Yuan Li


2023

pdf bib
Effects of Human Adversarial and Affable Samples on BERT Generalization
Aparna Elangovan | Estrid He | Yuan Li | Karin Verspoor
Findings of the Association for Computational Linguistics: EMNLP 2023

BERT-based models have had strong performance on leaderboards, yet have been demonstrably worse in real-world settings requiring generalization. Limited quantities of training data is considered a key impediment to achieving generalizability in machine learning. In this paper, we examine the impact of training data quality, not quantity, on a model’s generalizability. We consider two characteristics of training data: the portion of human-adversarial (h-adversarial), i.e. sample pairs with seemingly minor differences but different ground-truth labels, and human-affable (h-affable) training samples, i.e. sample pairs with minor differences but the same ground-truth label. We find that for a fixed size of training samples, as a rule of thumb, having 10-30% h-adversarial instances improves the precision, and therefore F1, by up to 20 points in the tasks of text classification and relation extraction. Increasing h-adversarials beyond this range can result in performance plateaus or even degradation. In contrast, h-affables may not contribute to a model’s generalizability and may even degrade generalization performance.

2020

pdf bib
Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
Chunyuan Li | Xiang Gao | Yuan Li | Baolin Peng | Xiujun Li | Yizhe Zhang | Jianfeng Gao
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

When trained effectively, the Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language. In this paper, we propose the first large-scale language VAE model Optimus (Organizing sentences via Pre-Trained Modeling of a Universal Space). A universal latent embedding space for sentences is first pre-trained on large text corpus, and then fine-tuned for various language generation and understanding tasks. Compared with GPT-2, Optimus enables guided language generation from an abstract level using the latent vectors. Compared with BERT, Optimus can generalize better on low-resource language understanding tasks due to the smooth latent space structure. Extensive experimental results on a wide range of language tasks demonstrate the effectiveness of Optimus. It achieves new state-of-the-art on VAE language modeling benchmarks.

2019

pdf bib
Massively Multilingual Transfer for NER
Afshin Rahimi | Yuan Li | Trevor Cohn
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In cross-lingual transfer, NLP models over one or more source languages are applied to a low-resource target language. While most prior work has used a single source model or a few carefully selected models, here we consider a “massive” setting with many such models. This setting raises the problem of poor transfer, particularly from distant languages. We propose two techniques for modulating the transfer, suitable for zero-shot or few-shot learning, respectively. Evaluating on named entity recognition, we show that our techniques are much more effective than strong baselines, including standard ensembling, and our unsupervised method rivals oracle selection of the single best individual model.

2017

pdf bib
Learning how to Active Learn: A Deep Reinforcement Learning Approach
Meng Fang | Yuan Li | Trevor Cohn
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Active learning aims to select a small subset of data for annotation such that a classifier learned on the data is highly accurate. This is usually done using heuristic selection methods, however the effectiveness of such methods is limited and moreover, the performance of heuristics varies between datasets. To address these shortcomings, we introduce a novel formulation by reframing the active learning as a reinforcement learning problem and explicitly learning a data selection policy, where the policy takes the role of the active learning heuristic. Importantly, our method allows the selection policy learned using simulation to one language to be transferred to other languages. We demonstrate our method using cross-lingual named entity recognition, observing uniform improvements over traditional active learning algorithms.