A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks

Ni Xuanfan, Li Piji


Abstract
“Recent efforts have evaluated large language models (LLMs) in areas such as com-monsense reasoning, mathematical reasoning, and code generation. However, to thebest of our knowledge, no work has specifically investigated the performance of LLMsin natural language generation (NLG) tasks, a pivotal criterion for determining modelexcellence. Thus, this paper conducts a comprehensive evaluation of well-known andhigh-performing LLMs, namely ChatGPT, ChatGLM, T5-based models, LLaMA-basedmodels, and Pythia-based models, in the context of NLG tasks. We select English andChinese datasets encompassing Dialogue Generation and Text Summarization. More-over, we propose a common evaluation setting that incorporates input templates andpost-processing strategies. Our study reports both automatic results, accompanied by adetailed analysis.”
Anthology ID:
2023.ccl-2.4
Original:
2023.ccl-2.4v1
Version 2:
2023.ccl-2.4v2
Volume:
Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum)
Month:
August
Year:
2023
Address:
Harbin, China
Editor:
Jiajun Zhang
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
40–56
Language:
English
URL:
https://aclanthology.org/2023.ccl-2.4
DOI:
Bibkey:
Cite (ACL):
Ni Xuanfan and Li Piji. 2023. A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks. In Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum), pages 40–56, Harbin, China. Chinese Information Processing Society of China.
Cite (Informal):
A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks (Xuanfan & Piji, CCL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ccl-2.4.pdf