A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks

Ni Xuanfan; Li Piji

A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks

Abstract

“Recent efforts have evaluated large language models (LLMs) in areas such as com-monsense reasoning, mathematical reasoning, and code generation. However, to thebest of our knowledge, no work has specifically investigated the performance of LLMsin natural language generation (NLG) tasks, a pivotal criterion for determining modelexcellence. Thus, this paper conducts a comprehensive evaluation of well-known andhigh-performing LLMs, namely ChatGPT, ChatGLM, T5-based models, LLaMA-basedmodels, and Pythia-based models, in the context of NLG tasks. We select English andChinese datasets encompassing Dialogue Generation and Text Summarization. More-over, we propose a common evaluation setting that incorporates input templates andpost-processing strategies. Our study reports both automatic results, accompanied by adetailed analysis.”

Anthology ID:: 2023.ccl-2.4
Original:: 2023.ccl-2.4v1
Version 2:: 2023.ccl-2.4v2
Volume:: Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum)
Month:: August
Year:: 2023
Address:: Harbin, China
Editor:: Jiajun Zhang
Venue:: CCL
SIG:
Publisher:: Chinese Information Processing Society of China
Note:
Pages:: 40–56
Language:: English
URL:: https://aclanthology.org/2023.ccl-2.4
DOI:
Bibkey:
Cite (ACL):: Ni Xuanfan and Li Piji. 2023. A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks. In Proceedings of the 22nd Chinese National Conference on Computational Linguistics (Volume 2: Frontier Forum), pages 40–56, Harbin, China. Chinese Information Processing Society of China.
Cite (Informal):: A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks (Xuanfan & Piji, CCL 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.ccl-2.4.pdf

PDF (v2) PDF (v1) Cite Search