N-Grams TextRank A Novel Domain Keyword Extraction Technique

Saransh Rajput, Akshat Gahoi, Manvith Reddy, Dipti Mishra Sharma


Abstract
The rapid growth of the internet has given us a wealth of information and data spread across the web. However, as the data begins to grow we simultaneously face the grave problem of an Information Explosion. An abundance of data can lead to large scale data management problems as well as the loss of the true meaning of the data. In this paper, we present an advanced domain specific keyword extraction algorithm in order to tackle this problem of paramount importance. Our algorithm is based on a modified version of TextRank algorithm - an algorithm based on PageRank to successfully determine the keywords from a domain specific document. Furthermore, this paper proposes a modification to the traditional TextRank algorithm that takes into account bigrams and trigrams and returns results with an extremely high precision. We observe how the precision and f1-score of this model outperforms other models in many domains and the recall can be easily increased by increasing the number of results without affecting the precision. We also discuss about the future work of extending the same algorithm to Indian languages.
Anthology ID:
2020.icon-termtraction.3
Volume:
Proceedings of the 17th International Conference on Natural Language Processing (ICON): TermTraction 2020 Shared Task
Month:
December
Year:
2020
Address:
Patna, India
Editors:
Dipti Misra Sharma, Asif Ekbal, Karunesh Arora, Sudip Kumar Naskar, Dipankar Ganguly, Sobha L, Radhika Mamidi, Sunita Arora, Pruthwik Mishra, Vandan Mujadia
Venue:
ICON
SIG:
Publisher:
NLP Association of India (NLPAI)
Note:
Pages:
9–12
Language:
URL:
https://aclanthology.org/2020.icon-termtraction.3
DOI:
Bibkey:
Cite (ACL):
Saransh Rajput, Akshat Gahoi, Manvith Reddy, and Dipti Mishra Sharma. 2020. N-Grams TextRank A Novel Domain Keyword Extraction Technique. In Proceedings of the 17th International Conference on Natural Language Processing (ICON): TermTraction 2020 Shared Task, pages 9–12, Patna, India. NLP Association of India (NLPAI).
Cite (Informal):
N-Grams TextRank A Novel Domain Keyword Extraction Technique (Rajput et al., ICON 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.icon-termtraction.3.pdf