Systematic Assessment of Factual Knowledge in Large Language Models

Linhao Luo, Trang Vu, Dinh Phung, Reza Haf


Abstract
Previous studies have relied on existing question-answering benchmarks to evaluate the knowledge stored in large language models (LLMs). However, this approach has limitations regarding factual knowledge coverage, as it mostly focuses on generic domains which may overlap with the pretraining data. This paper proposes a framework to systematically assess the factual knowledge of LLMs by leveraging knowledge graphs (KGs). Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions. We systematically evaluate the state-of-the-art LLMs with KGs in generic and specific domains. The experiment shows that ChatGPT is consistently the top performer across all domains. We also find that LLMs performance depends on the instruction finetuning, domain and question complexity and is prone to adversarial context.
Anthology ID:
2023.findings-emnlp.885
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
13272–13286
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.885
DOI:
10.18653/v1/2023.findings-emnlp.885
Bibkey:
Cite (ACL):
Linhao Luo, Trang Vu, Dinh Phung, and Reza Haf. 2023. Systematic Assessment of Factual Knowledge in Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 13272–13286, Singapore. Association for Computational Linguistics.
Cite (Informal):
Systematic Assessment of Factual Knowledge in Large Language Models (Luo et al., Findings 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.findings-emnlp.885.pdf