h_da@ReproHumn – Reproduction of Human Evaluation and Technical Pipeline

Margot Mieskes, Jacob Georg Benz


Abstract
How reliable are human evaluation results? Is it possible to replicate human evaluation? This work takes a closer look at the evaluation of the output of a Text-to-Speech (TTS) system. Unfortunately, our results indicate that human evaluation is not as straightforward to replicate as expected. Additionally, we also present results on reproducing the technical background of the TTS system and discuss potential reasons for the reproduction failure.
Anthology ID:
2023.humeval-1.11
Volume:
Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Anya Belz, Maja Popović, Ehud Reiter, Craig Thomson, João Sedoc
Venues:
HumEval | WS
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
130–135
Language:
URL:
https://aclanthology.org/2023.humeval-1.11
DOI:
Bibkey:
Cite (ACL):
Margot Mieskes and Jacob Georg Benz. 2023. h_da@ReproHumn – Reproduction of Human Evaluation and Technical Pipeline. In Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems, pages 130–135, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
h_da@ReproHumn – Reproduction of Human Evaluation and Technical Pipeline (Mieskes & Benz, HumEval-WS 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.humeval-1.11.pdf