Automatically Generating Hindi Wikipedia Pages Using Wikidata as a Knowledge Graph: A Domain-Specific Template Sentences Approach

Aditya Agarwal, Radhika Mamidi


Abstract
This paper presents a method for automatically generating Wikipedia articles in the Hindi language, using Wikidata as a knowledge base. Our method extracts structured information from Wikidata, such as the names of entities, their properties, and their relationships, and then uses this information to generate natural language text that conforms to a set of templates designed for the domain of interest. We evaluate our method by generating articles about scientists, and we compare the resulting articles to machine-translated articles. Our results show that more than 70% of the generated articles using our method are better in terms of coherence, structure, and readability. Our approach has the potential to significantly reduce the time and effort required to create Wikipedia articles in Hindi and could be extended to other languages and domains as well.
Anthology ID:
2023.ranlp-1.2
Volume:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
Month:
September
Year:
2023
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd., Shoumen, Bulgaria
Note:
Pages:
11–21
Language:
URL:
https://aclanthology.org/2023.ranlp-1.2
DOI:
Bibkey:
Cite (ACL):
Aditya Agarwal and Radhika Mamidi. 2023. Automatically Generating Hindi Wikipedia Pages Using Wikidata as a Knowledge Graph: A Domain-Specific Template Sentences Approach. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 11–21, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
Cite (Informal):
Automatically Generating Hindi Wikipedia Pages Using Wikidata as a Knowledge Graph: A Domain-Specific Template Sentences Approach (Agarwal & Mamidi, RANLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ranlp-1.2.pdf