Do Transformers Parse while Predicting the Masked Word?

Haoyu Zhao; Abhishek Panigrahi; Rong Ge; Sanjeev Arora

doi:10.18653/v1/2023.emnlp-main.1029

Do Transformers Parse while Predicting the Masked Word?

Haoyu Zhao, Abhishek Panigrahi, Rong Ge, Sanjeev Arora

Abstract

Pre-trained language models have been shown to encode linguistic structures like parse trees in their embeddings while being trained unsupervised. Some doubts have been raised whether the models are doing parsing or only some computation weakly correlated with it. Concretely: (a) Is it possible to explicitly describe transformers with realistic embedding dimensions, number of heads, etc. that are capable of doing parsing — or even approximate parsing? (b) Why do pre-trained models capture parsing structure? This paper takes a step toward answering these questions in the context of generative modeling with PCFGs. We show that masked language models like BERT or RoBERTa of moderate sizes can approximately execute the Inside-Outside algorithm for the English PCFG (Marcus et al., 1993). We also show that the Inside-Outside algorithm is optimal for masked language modeling loss on the PCFG-generated data. We conduct probing experiments on models pre-trained on PCFG-generated data to show that this not only allows recovery of approximate parse tree, but also recovers marginal span probabilities computed by the Inside-Outside algorithm, which suggests an implicit bias of masked language modeling towards this algorithm.

Anthology ID:: 2023.emnlp-main.1029
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 16513–16542
Language:
URL:: https://aclanthology.org/2023.emnlp-main.1029
DOI:: 10.18653/v1/2023.emnlp-main.1029
Bibkey:
Cite (ACL):: Haoyu Zhao, Abhishek Panigrahi, Rong Ge, and Sanjeev Arora. 2023. Do Transformers Parse while Predicting the Masked Word?. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16513–16542, Singapore. Association for Computational Linguistics.
Cite (Informal):: Do Transformers Parse while Predicting the Masked Word? (Zhao et al., EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-main.1029.pdf
Video:: https://aclanthology.org/2023.emnlp-main.1029.mp4

PDF Cite Search Video