Fahad Khan

Also published as: Anas Fahad Khan


2023

pdf bib
Some Considerations in the Construction of a Historical Language WordNet
Fahad Khan | John P. McCrae | Francisco Javier Minaya Gómez | Rafael Cruz González | Javier E. Díaz-Vera
Proceedings of the 12th Global Wordnet Conference

This article describes the manual construction of a part of the Old English WordNet (Old-EWN) covering the semantic field of emotion terms. This manually constructed part of the wordnet is to be eventually integrated with the automatically generated/manually checked part covering the whole of the rest of the Old English lexicon (currently under construction). We present the workflow for the definition of these emotion synsets on the basis of a dataset produced by a specialist in this area. We also look at the enrichment of the original Global WordNet Association Lexical Markup Framework (GWA LMF) schema to include the extra information which this part of the OldEWN requires. In the final part of the article we discuss how the wordnet style of lexicon organisation can be used to share and disseminate research findings/datasets in lexical semantics.

pdf bib
Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM
Sahal Mullappilly | Abdelrahman Shaker | Omkar Thawakar | Hisham Cholakkal | Rao Anwer | Salman Khan | Fahad Khan
Findings of the Association for Computational Linguistics: EMNLP 2023

Climate change is one of the most significant challenges we face together as a society. Creating awareness and educating policy makers the wide-ranging impact of climate change is an essential step towards a sustainable future. Recently, Large Language Models (LLMs) like ChatGPT and Bard have shown impressive conversational abilities and excel in a wide variety of NLP tasks. While these models are close-source, recently alternative open-source LLMs such as Stanford Alpaca and Vicuna have shown promising results. However, these open-source models are not specifically tailored for climate related domain specific information and also struggle to generate meaningful responses in other languages such as, Arabic. To this end, we propose a light-weight Arabic Mini-ClimateGPT that is built on an open-source LLM and is specifically fine-tuned on a conversational-style instruction tuning curated Arabic dataset Clima500-Instruct with over 500k instructions about climate change and sustainability. Further, our model also utilizes a vector embedding based retrieval mechanism during inference. We validate our proposed model through quantitative and qualitative evaluations on climate-related queries. Our model surpasses the baseline LLM in 88.3% of cases during ChatGPT-based evaluation. Furthermore, our human expert evaluation reveals an 81.6% preference for our model’s responses over multiple popular open-source models. Our open-source demos, models and curated instruction sets are available here : https://github.com/mbzuai-oryx/ClimateGPT

pdf bib
Graph Databases for Diachronic Language Data Modelling
Barbara McGillivray | Pierluigi Cassotti | Davide Di Pierro | Paola Marongiu | Anas Fahad Khan | Stefano Ferilli | Pierpaolo Basile
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
ISO LMF 24613-6: A Revised Syntax Semantics Module for the Lexical Markup Framework
Francesca Frontini | Laurent Romary | Anas Fahad Khan
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
Towards a Conversational Web? A Benchmark for Analysing Semantic Change with Conversational Knowledge Bots and Linked Open Data
Florentina Armaselu | Elena-Simona Apostol | Christian Chiarcos | Anas Fahad Khan | Chaya Liebeskind | Barbara McGillivray | Ciprian-Octavian Truica | Andrius Utka | Giedrė Valūnaitė-Oleškevičienė
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
Workflow Reversal and Data Wrangling in Multilingual Diachronic Analysis and Linguistic Linked Open Data Modelling
Florentina Armaselu | Barbara McGillivray | Chaya Liebeskind | Giedrė Valūnaitė Oleškevičienė | Andrius Utka | Daniela Gifu | Anas Fahad Khan | Elena-Simona Apostol | Ciprian-Octavian Truica
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
A Linked Data Approach for linking and aligning Sign Language and Spoken Language Data
Thierry Declerck | Sam Bigeard | Fahad Khan | Irene Murtagh | Sussi Olsen | Mike Rosner | Ineke Schuurman | Andon Tchechmedjiev | Andy Way
Proceedings of the Second International Workshop on Automatic Translation for Signed and Spoken Languages

We present work dealing with a Linked Open Data (LOD)-compliant representation of Sign Language (SL) data, with the goal of supporting the cross-lingual alignment of SL data and their linking to Spoken Language (SpL) data. The proposed representation is based on activities of groups of researchers in the field of SL who have investigated the use of Open Multilingual Wordnet (OMW) datasets for (manually) cross-linking SL data or for linking SL and SpL data. Another group of researchers is proposing an XML encoding of articulatory elements of SLs and (manually) linking those to an SpL lexical resource. We propose an RDF-based representation of those various data. This unified formal representation offers a semantic repository of information on SL and SpL data that could be accessed for supporting the creation of datasets for training or evaluating NLP applications dealing with SLs, thinking for example of Machine Translation (MT) between SLs and between SLs and SpLs.

2022

pdf bib
Modelling Collocations in OntoLex-FrAC
Christian Chiarcos | Katerina Gkirtzou | Maxim Ionov | Besim Kabashi | Fahad Khan | Ciprian-Octavian Truică
Proceedings of Globalex Workshop on Linked Lexicography within the 13th Language Resources and Evaluation Conference

Following presentations of frequency and attestations, and embeddings and distributional similarity, this paper introduces the third cornerstone of the emerging OntoLex module for Frequency, Attestation and Corpus-based Information, OntoLex-FrAC. We provide an RDF vocabulary for collocations, established as a consensus over contributions from five different institutions and numerous data sets, with the goal of eliciting feedback from reviewers, workshop audience and the scientific community in preparation of the final consolidation of the OntoLex-FrAC module, whose publication as a W3C community report is foreseen for the end of this year. The novel collocation component of OntoLex-FrAC is described in application to a lexicographic resource and corpus-based collocation scores available from the web, and finally, we demonstrate the capability and genericity of the model by showing how to retrieve and aggregate collocation information by means of SPARQL, and its export to a tabular format, so that it can be easily processed in downstream applications.

pdf bib
From Inscriptions to Lexica and Back: A Platform for Editing and Linking the Languages of Ancient Italy
Valeria Quochi | Andrea Bellandi | Fahad Khan | Michele Mallia | Francesca Murano | Silvia Piccini | Luca Rigobianco | Alessandro Tommasi | Cesare Zavattari
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

Available language technology is hardly applicable to scarcely attested ancient languages, yet their digital semantic representation, though challenging, is an asset for the purpose of sharing and preserving existing cultural knowledge. In the context of a project on the languages and cultures of ancient Italy, we took up this challenge. The paper thus describes the development of a user friendly web platform, EpiLexO, for the creation and editing of an integrated system of language resources for ancient fragmentary languages centered on the lexicon, in compliance with current digital humanities and Linked Open Data principles. EpiLexo allows for the editing of lexica with all relevant cross-references: for their linking to their testimonies, as well as to bibliographic information and other (external) resources and common vocabularies. The focus of the current implementation is on the languages of ancient Italy, in particular Oscan, Faliscan, Celtic and Venetic; however, the technological solutions are designed to be general enough to be potentially applicable to different scenarios.

pdf bib
Towards the Construction of a WordNet for Old English
Fahad Khan | Francisco J. Minaya Gómez | Rafael Cruz González | Harry Diakoff | Javier E. Diaz Vera | John P. McCrae | Ciara O’Loughlin | William Michael Short | Sander Stolk
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper we will discuss our preliminary work towards the construction of a WordNet for Old English, taking our inspiration from other similar WN construction projects for ancient languages such as Ancient Greek, Latin and Sanskrit. The Old English WordNet (OldEWN) will build upon this innovative work in a number of different ways which we articulate in the article, most importantly by treateating figurative meaning as a ‘first-class citizen’ in the structuring of the semantic system. From a more practical perspective we will describe our plan to utilize a pre-existing lexicographic resource and the naisc system to automatically compile a provisional version of the WordNet which will then be checked and enriched by Old English experts.

pdf bib
Proceedings of the Workshop on Terminology in the 21st century: many faces, many places
Rute Costa | Sara Carvalho | Ana Ostroški Anić | Anas Fahad Khan
Proceedings of the Workshop on Terminology in the 21st century: many faces, many places

pdf bib
A Survey of Guidelines and Best Practices for the Generation, Interlinking, Publication, and Validation of Linguistic Linked Data
Fahad Khan | Christian Chiarcos | Thierry Declerck | Maria Pia Di Buono | Milan Dojchinovski | Jorge Gracia | Giedre Valunaite Oleskeviciene | Daniela Gifu
Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference

This article discusses a survey carried out within the NexusLinguarum COST Action which aimed to give an overview of existing guidelines (GLs) and best practices (BPs) in linguistic linked data. In particular it focused on four core tasks in the production/publication of linked data: generation, interlinking, publication, and validation. We discuss the importance of GLs and BPs for LLD before describing the survey and its results in full. Finally we offer a number of directions for future work in order to address the findings of the survey.

pdf bib
Computational Morphology with OntoLex-Morph
Christian Chiarcos | Katerina Gkirtzou | Fahad Khan | Penny Labropoulou | Marco Passarotti | Matteo Pellegrini
Proceedings of the 8th Workshop on Linked Data in Linguistics within the 13th Language Resources and Evaluation Conference

This paper describes the current status of the emerging OntoLex module for linguistic morphology. It serves as an update to the previous version of the vocabulary (Klimek et al. 2019). Whereas this earlier model was exclusively focusing on descriptive morphology and focused on applications in lexicography, we now present a novel part and a novel application of the vocabulary to applications in language technology, i.e., the rule-based generation of lexicons, introducing a dynamic component into OntoLex.

2020

pdf bib
Modelling Frequency and Attestations for OntoLex-Lemon
Christian Chiarcos | Maxim Ionov | Jesse de Does | Katrien Depuydt | Anas Fahad Khan | Sander Stolk | Thierry Declerck | John Philip McCrae
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

The OntoLex vocabulary enjoys increasing popularity as a means of publishing lexical resources with RDF and as Linked Data. The recent publication of a new OntoLex module for lexicography, lexicog, reflects its increasing importance for digital lexicography. However, not all aspects of digital lexicography have been covered to the same extent. In particular, supplementary information drawn from corpora such as frequency information, links to attestations, and collocation data were considered to be beyond the scope of lexicog. Therefore, the OntoLex community has put forward the proposal for a novel module for frequency, attestation and corpus information (FrAC), that not only covers the requirements of digital lexicography, but also accommodates essential data structures for lexical information in natural language processing. This paper introduces the current state of the OntoLex-FrAC vocabulary, describes its structure, some selected use cases, elementary concepts and fundamental definitions, with a focus on frequency and attestations.

pdf bib
Representing Temporal Information in Lexical Linked Data Resources
Fahad Khan
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)

The increasing recognition of the utility of Linked Data as a means of publishing lexical resource has helped to underline the need for RDF based data models which have the flexibility and expressivity to be able to represent the most salient kinds of information contained in such resources as structured data, including, notably, information relating to time and the temporal dimension. In this article we describe a perdurantist approach to modelling diachronic lexical information which builds upon work which we have previously presented and which is based on the ontolex-lemon vocabulary. We present two extended examples, one taken from the Oxford English Dictionary, the other from a work on etymology, to show how our approach can handle different kinds of temporal information often found in lexical resources.

pdf bib
Modelling Etymology in LMF/TEI: The Grande Dicionário Houaiss da Língua Portuguesa Dictionary as a Use Case
Fahad Khan | Laurent Romary | Ana Salgado | Jack Bowers | Mohamed Khemakhem | Toma Tasovac
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this article we will introduce two of the new parts of the new multi-part version of the Lexical Markup Framework (LMF) ISO standard, namely part 3 of the standard (ISO 24613-3), which deals with etymological and diachronic data, and Part 4 (ISO 24613-4), which consists of a TEI serialisation of all of the prior parts of the model. We will demonstrate the use of both standards by describing the LMF encoding of a small number of examples taken from a sample conversion of the reference Portuguese dictionary Grande Dicionário Houaiss da Língua Portuguesa, part of a broader experiment comprising the analysis of different, heterogeneously encoded, Portuguese lexical resources. We present the examples in the Unified Modelling Language (UML) and also in a couple of cases in TEI.

pdf bib
A Multilingual Evaluation Dataset for Monolingual Word Sense Alignment
Sina Ahmadi | John Philip McCrae | Sanni Nimb | Fahad Khan | Monica Monachini | Bolette Pedersen | Thierry Declerck | Tanja Wissik | Andrea Bellandi | Irene Pisani | Thomas Troelsgård | Sussi Olsen | Simon Krek | Veronika Lipp | Tamás Váradi | László Simon | András Gyorffy | Carole Tiberius | Tanneke Schoonheim | Yifat Ben Moshe | Maya Rudich | Raya Abu Ahmad | Dorielle Lonke | Kira Kovalenko | Margit Langemets | Jelena Kallas | Oksana Dereza | Theodorus Fransen | David Cillessen | David Lindemann | Mikel Alonso | Ana Salgado | José Luis Sancho | Rafael-J. Ureña-Ruiz | Jordi Porta Zamorano | Kiril Simov | Petya Osenova | Zara Kancheva | Ivaylo Radev | Ranka Stanković | Andrej Perdih | Dejan Gabrovsek
Proceedings of the Twelfth Language Resources and Evaluation Conference

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.

2018

pdf bib
One Language to rule them all: modelling Morphological Patterns in a Large Scale Italian Lexicon with SWRL
Fahad Khan | Andrea Bellandi | Francesca Frontini | Monica Monachini
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
Situating Word Senses in their Historical Context with Linked Data
Fahad Khan | Jack Bowers | Francesca Frontini
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers

pdf bib
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017)
Francesca Frontini | Larisa Grčić Simeunović | Špela Vintar | Anas Fahad Khan | Artemis Parvisi
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017)

pdf bib
Designing an Ontology for the Study of Ritual in Ancient Greek Tragedy
Gloria Mugelli | Andrea Bellandi | Federico Boschetti | Anas Fahad Khan
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017)

2016

pdf bib
Al Qamus al Muhit, a Medieval Arabic Lexicon in LMF
Ouafae Nahli | Francesca Frontini | Monica Monachini | Fahad Khan | Arsalan Zarghili | Mustapha Khalfi
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the conversion into LMF, a standard lexicographic digital format of ‘al-qāmūs al-muḥīṭ, a Medieval Arabic lexicon. The lexicon is first described, then all the steps required for the conversion are illustrated. The work is will produce a useful lexicographic resource for Arabic NLP, but is also interesting per se, to study the implications of adapting the LMF model to the Arabic language. Some reflections are offered as to the status of roots with respect to previously suggested representations. In particular, roots are, in our opinion are to be not treated as lexical entries, but modeled as lexical metadata for classifying and identifying lexical entries. In this manner, each root connects all entries that are derived from it.

pdf bib
LREC as a Graph: People and Resources in a Network
Riccardo Del Gratta | Francesca Frontini | Monica Monachini | Gabriella Pardelli | Irene Russo | Roberto Bartolini | Fahad Khan | Claudia Soria | Nicoletta Calzolari
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This proposal describes a new way to visualise resources in the LREMap, a community-built repository of language resource descriptions and uses. The LREMap is represented as a force-directed graph, where resources, papers and authors are nodes. The analysis of the visual representation of the underlying graph is used to study how the community gathers around LRs and how LRs are used in research.

pdf bib
Tools and Instruments for Building and Querying Diachronic Computational Lexica
Fahad Khan | Andrea Bellandi | Monica Monachini
Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH)

This article describes work on enabling the addition of temporal information to senses of words in linguistic linked open data lexica based on the lemonDia model. Our contribution in this article is twofold. On the one hand, we demonstrate how lemonDia enables the querying of diachronic lexical datasets using OWL-oriented Semantic Web based technologies. On the other hand, we present a preliminary version of an interactive interface intended to help users in creating lexical datasets that model meaning change over time.

2015

pdf bib
Using Ontologies to Model Polysemy in Lexical Resources
Fahad Khan | Francesca Frontini
Proceedings of the 1st Workshop on Language and Ontologies

2014

pdf bib
The IMAGACT Visual Ontology. An Extendable Multilingual Infrastructure for the representation of lexical encoding of Action
Massimo Moneglia | Susan Brown | Francesca Frontini | Gloria Gagliardi | Fahad Khan | Monica Monachini | Alessandro Panunzi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Action verbs have many meanings, covering actions in different ontological types. Moreover, each language categorizes action in its own way. One verb can refer to many different actions and one action can be identified by more than one verb. The range of variations within and across languages is largely unknown, causing trouble for natural language processing tasks. IMAGACT is a corpus-based ontology of action concepts, derived from English and Italian spontaneous speech corpora, which makes use of the universal language of images to identify the different action types extended by verbs referring to action in English, Italian, Chinese and Spanish. This paper presents the infrastructure and the various linguistic information the user can derive from it. IMAGACT makes explicit the variation of meaning of action verbs within one language and allows comparisons of verb variations within and across languages. Because the action concepts are represented with videos, extension into new languages beyond those presently implemented in IMAGACT is done using competence-based judgments by mother-tongue informants without intense lexicographic work involving underdetermined semantic description

2013

pdf bib
Generative Lexicon Theory and Linguistic Linked Open Data
Fahad Khan | Francesca Frontini | Riccardo Del Gratta | Monica Monachini | Valeria Quochi
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)

pdf bib
Disambiguation of Basic Action Types through Nouns’ Telic Qualia
Irene Russo | Francesca Frontini | Irene De Felice | Fahad Khan | Monica Monachini
Proceedings of the 6th International Conference on Generative Approaches to the Lexicon (GL2013)

2012

pdf bib
Verb interpretation for basic action types: annotation, ontology induction and creation of prototypical scenes
Francesca Frontini | Irene De Felice | Fahad Khan | Irene Russo | Monica Monachini | Gloria Gagliardi | Alessandro Panunzi
Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon

Search
Co-authors