Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains

Sanjana Ramprasad, Kundan Krishna, Zachary Lipton, Byron Wallace


Abstract
Recent work has shown that large language models (LLMs) are capable of generating summaries zero-shot—i.e., without explicit supervision—that, under human assessment, are often comparable or even preferred to manually composed reference summaries. However, this prior work has focussed almost exclusively on evaluating news article summarization. How do zero-shot summarizers perform in other (potentially more specialized) domains?In this work we evaluate zero-shot generated summaries across specialized domains including: biomedical articles, and legal bills (in addition to standard news benchmarks for reference). We focus especially on the factuality of outputs. We acquire annotations from domain experts to identify inconsistencies in summaries and systematically categorize these errors. We analyze whether the prevalence of a given domain in the pretraining corpus affects extractiveness and faithfulness of generated summaries of articles in this domain. We release all collected annotations to facilitate additional research toward measuring and realizing factually accurate summarization, beyond news articles (The dataset can be downloaded from https://anonymous.4open.science/r/zero_shot_faceval_domains-9B83)
Anthology ID:
2024.eacl-short.7
Volume:
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
March
Year:
2024
Address:
St. Julian’s, Malta
Editors:
Yvette Graham, Matthew Purver
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
50–59
Language:
URL:
https://aclanthology.org/2024.eacl-short.7
DOI:
Bibkey:
Cite (ACL):
Sanjana Ramprasad, Kundan Krishna, Zachary Lipton, and Byron Wallace. 2024. Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers), pages 50–59, St. Julian’s, Malta. Association for Computational Linguistics.
Cite (Informal):
Evaluating the Factuality of Zero-shot Summarizers Across Varied Domains (Ramprasad et al., EACL 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.eacl-short.7.pdf