Aman Kumar

2022

pdf bib abs
FabKG: A Knowledge graph of Manufacturing Science domain utilizing structured and unconventional unstructured knowledge source
Aman Kumar | Akshay Bharadwaj | Binil Starly | Collin Lynch
Proceedings of the Workshop on Structured and Unstructured Knowledge Integration (SUKI)

As the demands for large-scale information processing have grown, knowledge graph-based approaches have gained prominence for representing general and domain knowledge. The development of such general representations is essential, particularly in domains such as manufacturing which intelligent processes and adaptive education can enhance. Despite the continuous accumulation of text in these domains, the lack of structured data has created information extraction and knowledge transfer barriers. In this paper, we report on work towards developing robust knowledge graphs based upon entity and relation data for both commercial and educational uses. To create the FabKG (Manufacturing knowledge graph), we have utilized textbook index words, research paper keywords, FabNER (manufacturing NER), to extract a sub knowledge base contained within Wikidata. Moreover, we propose a novel crowdsourcing method for KG creation by leveraging student notes, which contain invaluable information but are not captured as meaningful information, excluding their use in personal preparation for learning and written exams. We have created a knowledge graph containing 65000+ triples using all data sources. We have also shown the use case of domain-specific question answering and expression/formula-based question answering for educational purposes.

Natural Language Generation (NLG) for non-English languages is hampered by the scarcity of datasets in these languages. We present the IndicNLG Benchmark, a collection of datasets for benchmarking NLG for 11 Indic languages. We focus on five diverse tasks, namely, biography generation using Wikipedia infoboxes, news headline generation, sentence summarization, paraphrase generation and, question generation. We describe the created datasets and use them to benchmark the performance of several monolingual and multilingual baselines that leverage pre-trained sequence-to-sequence models. Our results exhibit the strong performance of multilingual language-specific pre-trained models, and the utility of models trained on our dataset for other related NLG tasks. Our dataset creation methods can be easily applied to modest-resource languages as they involve simple steps such as scraping news articles and Wikipedia infoboxes, light cleaning, and pivoting through machine translation data. To the best of our knowledge, the IndicNLG Benchmark is the first NLG benchmark for Indic languages and the most diverse multilingual NLG dataset, with approximately 8M examples across 5 tasks and 11 languages. The datasets and models will be publicly available.

2016

pdf bib abs
Experiments in Candidate Phrase Selection for Financial Named Entity Extraction - A Demo
Aman Kumar | Hassan Alam | Tina Werner | Manan Vyas
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

In this study we develop a system that tags and extracts financial concepts called financial named entities (FNE) along with corresponding numeric values – monetary and temporal. We employ machine learning and natural language processing methods to identify financial concepts and dates, and link them to numerical entities.

2012

pdf bib
Revisiting Arabic Semantic Role Labeling using SVM Kernel Methods
Laurel Hart | Hassan Alam | Aman Kumar
Proceedings of COLING 2012: Demonstration Papers

2008

2002