Slav Petrov


2023

pdf bib
Transcending Scaling Laws with 0.1% Extra Compute
Yi Tay | Jason Wei | Hyung Chung | Vinh Tran | David So | Siamak Shakeri | Xavier Garcia | Steven Zheng | Jinfeng Rao | Aakanksha Chowdhery | Denny Zhou | Donald Metzler | Slav Petrov | Neil Houlsby | Quoc Le | Mostafa Dehghani
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Scaling language models improves performance but comes with significant computational costs. This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute. The key idea is to continue training a state-of-the-art large language model on a few more steps with UL2’s mixture-of-denoiser objective. We show that, with almost negligible extra computational costs and no new sources of data, we are able to substantially improve the scaling properties of large language models on downstream metrics. In this paper, we continue training a baseline language model, PaLM, with ULR2, introducing a new set of models at 8B, 62B, and 540B scale which we call U-PaLM. Impressively, at 540B scale, we show an approximately 2x computational savings rate where U-PaLM achieves the same performance as the final PaLM 540B model at around half its computational budget (i.e., saving ~4.4 million TPUv4 hours). We further show that this improved scaling curve leads to “emergent abilities” on challenging BIG-Bench tasks—for instance, U-PaLM does much better on some tasks or demonstrates better quality at much smaller scale (62B as opposed to 540B). Overall, we show that U-PaLM outperforms PaLM on many few-shot setups, including reasoning tasks with chain-of-thought (e.g., GSM8K), multilingual tasks (MGSM, TydiQA), MMLU and challenging BIG-Bench tasks.

2019

pdf bib
Natural Questions: A Benchmark for Question Answering Research
Tom Kwiatkowski | Jennimaria Palomaki | Olivia Redfield | Michael Collins | Ankur Parikh | Chris Alberti | Danielle Epstein | Illia Polosukhin | Jacob Devlin | Kenton Lee | Kristina Toutanova | Llion Jones | Matthew Kelcey | Ming-Wei Chang | Andrew M. Dai | Jakob Uszkoreit | Quoc Le | Slav Petrov
Transactions of the Association for Computational Linguistics, Volume 7

We present the Natural Questions corpus, a question answering data set. Questions consist of real anonymized, aggregated queries issued to the Google search engine. An annotator is presented with a question along with a Wikipedia page from the top 5 search results, and annotates a long answer (typically a paragraph) and a short answer (one or more entities) if present on the page, or marks null if no long/short answer is present. The public release consists of 307,373 training examples with single annotations; 7,830 examples with 5-way annotations for development data; and a further 7,842 examples with 5-way annotated sequestered as test data. We present experiments validating quality of the data. We also describe analysis of 25-way annotations on 302 examples, giving insights into human variability on the annotation task. We introduce robust metrics for the purposes of evaluating question answering systems; demonstrate high human upper bounds on these metrics; and establish baseline results using competitive methods drawn from related literature.

2018

pdf bib
CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Jan Hajič | Martin Popel | Martin Potthast | Milan Straka | Filip Ginter | Joakim Nivre | Slav Petrov
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

Every year, the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2018, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on test input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. This shared task constitutes a 2nd edition—the first one took place in 2017 (Zeman et al., 2017); the main metric from 2017 has been kept, allowing for easy comparison, also in 2018, and two new main metrics have been used. New datasets added to the Universal Dependencies collection between mid-2017 and the spring of 2018 have contributed to increased difficulty of the task this year. In this overview paper, we define the task and the updated evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

2017

pdf bib
Universal Semantic Parsing
Siva Reddy | Oscar Täckström | Slav Petrov | Mark Steedman | Mirella Lapata
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Universal Dependencies (UD) offer a uniform cross-lingual syntactic representation, with the aim of advancing multilingual applications. Recent work shows that semantic parsing can be accomplished by transforming syntactic dependencies to logical forms. However, this work is limited to English, and cannot process dependency graphs, which allow handling complex phenomena such as control. In this work, we introduce UDepLambda, a semantic interface for UD, which maps natural language to logical forms in an almost language-independent fashion and can process dependency graphs. We perform experiments on question answering against Freebase and provide German and Spanish translations of the WebQuestions and GraphQuestions datasets to facilitate multilingual evaluation. Results show that UDepLambda outperforms strong baselines across languages and datasets. For English, it achieves a 4.9 F1 point improvement over the state-of-the-art on GraphQuestions.

pdf bib
Natural Language Processing with Small Feed-Forward Networks
Jan A. Botha | Emily Pitler | Ji Ma | Anton Bakalov | Alex Salcianu | David Weiss | Ryan McDonald | Slav Petrov
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We show that small and shallow feed-forward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models. Motivated by resource-constrained environments like mobile phones, we showcase simple techniques for obtaining such small neural network models, and investigate different tradeoffs when deciding how to allocate a small memory budget.

pdf bib
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Martin Popel | Milan Straka | Jan Hajič | Joakim Nivre | Filip Ginter | Juhani Luotolahti | Sampo Pyysalo | Slav Petrov | Martin Potthast | Francis Tyers | Elena Badmaeva | Memduh Gokirmak | Anna Nedoluzhko | Silvie Cinková | Jan Hajič jr. | Jaroslava Hlaváčová | Václava Kettnerová | Zdeňka Urešová | Jenna Kanerva | Stina Ojala | Anna Missilä | Christopher D. Manning | Sebastian Schuster | Siva Reddy | Dima Taji | Nizar Habash | Herman Leung | Marie-Catherine de Marneffe | Manuela Sanguinetti | Maria Simi | Hiroshi Kanayama | Valeria de Paiva | Kira Droganova | Héctor Martínez Alonso | Çağrı Çöltekin | Umut Sulubacak | Hans Uszkoreit | Vivien Macketanz | Aljoscha Burchardt | Kim Harris | Katrin Marheinecke | Georg Rehm | Tolga Kayadelen | Mohammed Attia | Ali Elkahky | Zhuoran Yu | Emily Pitler | Saran Lertpradit | Michael Mandl | Jesse Kirchner | Hector Fernandez Alcalde | Jana Strnadová | Esha Banerjee | Ruli Manurung | Antonio Stella | Atsuko Shimada | Sookyoung Kwak | Gustavo Mendonça | Tatiana Lando | Rattima Nitisaroj | Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.

2016

pdf bib
Universal Dependencies v1: A Multilingual Treebank Collection
Joakim Nivre | Marie-Catherine de Marneffe | Filip Ginter | Yoav Goldberg | Jan Hajič | Christopher D. Manning | Ryan McDonald | Slav Petrov | Sampo Pyysalo | Natalia Silveira | Reut Tsarfaty | Daniel Zeman
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Cross-linguistically consistent annotation is necessary for sound comparative evaluation and cross-lingual learning experiments. It is also useful for multilingual system development and comparative linguistic studies. Universal Dependencies is an open community effort to create cross-linguistically consistent treebank annotation for many languages within a dependency-based lexicalist framework. In this paper, we describe v1 of the universal guidelines, the underlying design principles, and the currently available treebanks for 33 languages.

pdf bib
Globally Normalized Transition-Based Neural Networks
Daniel Andor | Chris Alberti | David Weiss | Aliaksei Severyn | Alessandro Presta | Kuzman Ganchev | Slav Petrov | Michael Collins
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Improved Transition-Based Parsing and Tagging with Neural Networks
Chris Alberti | David Weiss | Greg Coppola | Slav Petrov
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Structured Training for Neural Network Transition-Based Parsing
David Weiss | Chris Alberti | Michael Collins | Slav Petrov
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Learning Compact Lexicons for CCG Semantic Parsing
Yoav Artzi | Dipanjan Das | Slav Petrov
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
Temporal Analysis of Language through Neural Language Models
Yoon Kim | Yi-I Chiu | Kentaro Hanaki | Darshan Hegde | Slav Petrov
Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science

pdf bib
INVITED TALK 2: Towards Universal Syntactic Processing of Natural Language
Slav Petrov
Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants

pdf bib
Enhanced Search with Wildcards and Morphological Inflections in the Google Books Ngram Viewer
Jason Mann | David Zhang | Lu Yang | Dipanjan Das | Slav Petrov
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2013

pdf bib
Universal Dependency Annotation for Multilingual Parsing
Ryan McDonald | Joakim Nivre | Yvonne Quirmbach-Brundage | Yoav Goldberg | Dipanjan Das | Kuzman Ganchev | Keith Hall | Slav Petrov | Hao Zhang | Oscar Täckström | Claudia Bedini | Núria Bertomeu Castelló | Jungmee Lee
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
Oscar Täckström | Dipanjan Das | Slav Petrov | Ryan McDonald | Joakim Nivre
Transactions of the Association for Computational Linguistics, Volume 1

We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random field model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages.

pdf bib
Source-Side Classifier Preordering for Machine Translation
Uri Lerner | Slav Petrov
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2012

pdf bib
Using Search-Logs to Improve Query Tagging
Kuzman Ganchev | Keith Hall | Ryan McDonald | Slav Petrov
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Syntactic Annotations for the Google Books NGram Corpus
Yuri Lin | Jean-Baptiste Michel | Erez Aiden Lieberman | Jon Orwant | Will Brockman | Slav Petrov
Proceedings of the ACL 2012 System Demonstrations

pdf bib
A Universal Part-of-Speech Tagset
Slav Petrov | Dipanjan Das | Ryan McDonald
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via three experiments, that (1) compare tagging accuracies across languages, (2) present an unsupervised grammar induction approach that does not use gold standard part-of-speech tags, and (3) use the universal tags to transfer dependency parsers between languages, achieving state-of-the-art results.

pdf bib
Vine Pruning for Efficient Multi-Pass Dependency Parsing
Alexander Rush | Slav Petrov
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf bib
Efficient Parallel CKY Parsing on GPUs
Youngmin Yi | Chao-Yue Lai | Slav Petrov | Kurt Keutzer
Proceedings of the 12th International Conference on Parsing Technologies

pdf bib
Multi-Source Transfer of Delexicalized Dependency Parsers
Ryan McDonald | Slav Petrov | Keith Hall
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Training a Parser for Machine Translation Reordering
Jason Katz-Brown | Slav Petrov | Ryan McDonald | Franz Och | David Talbot | Hiroshi Ichikawa | Masakazu Seno | Hideto Kazawa
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
Dipanjan Das | Slav Petrov
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Self-Training with Products of Latent Variable Grammars
Zhongqiang Huang | Mary Harper | Slav Petrov
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
Amarnag Subramanya | Slav Petrov | Fernando Pereira
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Uptraining for Accurate Deterministic Question Parsing
Slav Petrov | Pi-Chuan Chang | Michael Ringgaard | Hiyan Alshawi
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Products of Random Latent Variable Grammars
Slav Petrov
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Learning Better Monolingual Models with Unannotated Bilingual Text
David Burkett | Slav Petrov | John Blitzer | Dan Klein
Proceedings of the Fourteenth Conference on Computational Natural Language Learning

2008

pdf bib
Parsing German with Latent Variable Grammars
Slav Petrov | Dan Klein
Proceedings of the Workshop on Parsing German

pdf bib
Coarse-to-Fine Syntactic Machine Translation using Language Projections
Slav Petrov | Aria Haghighi | Dan Klein
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf bib
Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing
Slav Petrov | Dan Klein
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf bib
Improved Inference for Unlexicalized Parsing
Slav Petrov | Dan Klein
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
The Infinite PCFG Using Hierarchical Dirichlet Processes
Percy Liang | Slav Petrov | Michael Jordan | Dan Klein
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Learning Structured Models for Phone Recognition
Slav Petrov | Adam Pauls | Dan Klein
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Non-Local Modeling with a Mixture of PCFGs
Slav Petrov | Leon Barrett | Dan Klein
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

pdf bib
Learning Accurate, Compact, and Interpretable Tree Annotation
Slav Petrov | Leon Barrett | Romain Thibaux | Dan Klein
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

Search
Co-authors