Giorgio Maria Di Nunzio

Also published as: Giorgio Di Nunzio


2023

pdf bib
The Importance of Being Interoperable: Theoretical and Practical Implications in Converting TBX to OntoLex-Lemon
Andrea Bellandi | Giorgio Maria Di Nunzio | Silvia Piccini | Federica Vezzani
Proceedings of the 4th Conference on Language, Data and Knowledge

pdf bib
Findings of the WMT 2023 Biomedical Translation Shared Task: Evaluation of ChatGPT 3.5 as a Comparison System
Mariana Neves | Antonio Jimeno Yepes | Aurélie Névéol | Rachel Bawden | Giorgio Maria Di Nunzio | Roland Roller | Philippe Thomas | Federica Vezzani | Maika Vicente Navarro | Lana Yeganova | Dina Wiemann | Cristian Grozea
Proceedings of the Eighth Conference on Machine Translation

We present an overview of the Biomedical Translation Task that was part of the Eighth Conference on Machine Translation (WMT23). The aim of the task was the automatic translation of biomedical abstracts from the PubMed database. It included twelve language directions, namely, French, Spanish, Portuguese, Italian, German, and Russian, from and into English. We received submissions from 18 systems and for all the test sets that we released. Our comparison system was based on ChatGPT 3.5 and performed very well in comparison to many of the submissions.

2022

pdf bib
Findings of the WMT 2022 Biomedical Translation Shared Task: Monolingual Clinical Case Reports
Mariana Neves | Antonio Jimeno Yepes | Amy Siu | Roland Roller | Philippe Thomas | Maika Vicente Navarro | Lana Yeganova | Dina Wiemann | Giorgio Maria Di Nunzio | Federica Vezzani | Christel Gerardin | Rachel Bawden | Darryl Johan Estrada | Salvador Lima-lopez | Eulalia Farre-maduel | Martin Krallinger | Cristian Grozea | Aurelie Neveol
Proceedings of the Seventh Conference on Machine Translation (WMT)

In the seventh edition of the WMT Biomedical Task, we addressed a total of seven languagepairs, namely English/German, English/French, English/Spanish, English/Portuguese, English/Chinese, English/Russian, English/Italian. This year’s test sets covered three types of biomedical text genre. In addition to scientific abstracts and terminology items used in previous editions, we released test sets of clinical cases. The evaluation of clinical cases translations were given special attention by involving clinicians in the preparation of reference translations and manual evaluation. For the main MEDLINE test sets, we received a total of 609 submissions from 37 teams. For the ClinSpEn sub-task, we had the participation of five teams.

pdf bib
Knowledge Representation and Language Simplification of Human Rights
Sara Silecchia | Federica Vezzani | Giorgio Maria Di Nunzio
Proceedings of the Workshop on Terminology in the 21st century: many faces, many places

In this paper, we propose the description of a very recent interdisciplinary project aiming at analysing both the conceptual and linguistic dimensions of humanitarian rights terminology. This analysis will result in the form of a new knowledge-based multilingual terminological resource which is designed in order to meet the FAIR principles for Open Science and will serve, in the future, as a prototype for the development of a new software for the simplified rewriting of international legal texts relating to human rights. Given the early stage of the project, we will focus on the description of its rationale, the planned workflow, and the theoretical approach which will be adopted to achieve the main goal of this ambitious research project.

2021

pdf bib
Findings of the WMT 2021 Biomedical Translation Shared Task: Summaries of Animal Experiments as New Test Set
Lana Yeganova | Dina Wiemann | Mariana Neves | Federica Vezzani | Amy Siu | Inigo Jauregi Unanue | Maite Oronoz | Nancy Mah | Aurélie Névéol | David Martinez | Rachel Bawden | Giorgio Maria Di Nunzio | Roland Roller | Philippe Thomas | Cristian Grozea | Olatz Perez-de-Viñaspre | Maika Vicente Navarro | Antonio Jimeno Yepes
Proceedings of the Sixth Conference on Machine Translation

In the sixth edition of the WMT Biomedical Task, we addressed a total of eight language pairs, namely English/German, English/French, English/Spanish, English/Portuguese, English/Chinese, English/Russian, English/Italian, and English/Basque. Further, our tests were composed of three types of textual test sets. New to this year, we released a test set of summaries of animal experiments, in addition to the test sets of scientific abstracts and terminologies. We received a total of 107 submissions from 15 teams from 6 countries.

2020

pdf bib
On the Formal Standardization of Terminology Resources: The Case Study of TriMED
Federica Vezzani | Giorgio Maria Di Nunzio
Proceedings of the Twelfth Language Resources and Evaluation Conference

The process of standardization plays an important role in the management of terminological resources. In this context, we present the work of re-modeling an existing multilingual terminological database for the medical domain, named TriMED. This resource was conceived in order to tackle some problems related to the complexity of medical terminology and to respond to different users’ needs. We provide a methodology that should be followed in order to make a termbase compliant to the three most recent ISO/TC 37 standards. In particular, we focus on the definition of i) the structural meta-model of the resource, ii) the data categories provided, and iii) the TBX format for its implementation. In addition to the formal standardization of the resource, we describe the realization of a new data category repository for the management of the TriMED terminological data and a Web application that can be used to access the multilingual terminological records.

pdf bib
Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages
Rachel Bawden | Giorgio Maria Di Nunzio | Cristian Grozea | Inigo Jauregi Unanue | Antonio Jimeno Yepes | Nancy Mah | David Martinez | Aurélie Névéol | Mariana Neves | Maite Oronoz | Olatz Perez-de-Viñaspre | Massimo Piccardi | Roland Roller | Amy Siu | Philippe Thomas | Federica Vezzani | Maika Vicente Navarro | Dina Wiemann | Lana Yeganova
Proceedings of the Fifth Conference on Machine Translation

Machine translation of scientific abstracts and terminologies has the potential to support health professionals and biomedical researchers in some of their activities. In the fifth edition of the WMT Biomedical Task, we addressed a total of eight language pairs. Five language pairs were previously addressed in past editions of the shared task, namely, English/German, English/French, English/Spanish, English/Portuguese, and English/Chinese. Three additional languages pairs were also introduced this year: English/Russian, English/Italian, and English/Basque. The task addressed the evaluation of both scientific abstracts (all language pairs) and terminologies (English/Basque only). We received submissions from a total of 20 teams. For recurring language pairs, we observed an improvement in the translations in terms of automatic scores and qualitative evaluations, compared to previous years.

2018

pdf bib
TriMED: A Multilingual Terminological Database
Federica Vezzani | Giorgio Maria Di Nunzio | Geneviève Henrot
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Designing A Long Lasting Linguistic Project: The Case Study of ASIt
Maristella Agosti | Emanuele Di Buccio | Giorgio Maria Di Nunzio | Cecilia Poletto | Esther Rinke
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we discuss the requirements that a long lasting linguistic database should have in order to meet the needs of the linguists together with the aim of durability and sharing of data. In particular, we discuss the generalizability of the Syntactic Atlas of Italy, a linguistic project that builds on a long standing tradition of collecting and analyzing linguistic corpora, on a more recent project that focuses on the synchronic and diachronic analysis of the syntax of Italian and Portuguese relative clauses. The results that are presented are in line with the FLaReNet Strategic Agenda that highlighted the most pressing needs for research areas, such as Natural Language Processing, and presented a set of recommendations for the development and progress of Language resources in Europe.

2014

pdf bib
A Vector Space Model for Syntactic Distances Between Dialects
Emanuele Di Buccio | Giorgio Maria Di Nunzio | Gianmaria Silvello
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Syntactic comparison across languages is essential in the research field of linguistics, e.g. when investigating the relationship among closely related languages. In IR and NLP, the syntactic information is used to understand the meaning of word occurrences according to the context in which their appear. In this paper, we discuss a mathematical framework to compute the distance between languages based on the data available in current state-of-the-art linguistic databases. This framework is inspired by approaches presented in IR and NLP.

2012

pdf bib
A Curated Database for Linguistic Research: The Test Case of Cimbrian Varieties
Maristella Agosti | Birgit Alber | Giorgio Maria Di Nunzio | Marco Dussin | Stefan Rabanus | Alessandra Tomaselli
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper we present the definition of a conceptual approach for the information space entailed by a multidisciplinary and collaborative project, """"Cimbrian as a test case for synchronic and diachronic language variation'', which provides linguists with a test bed for formal hypotheses concerning human language. Aims of the project are to collect, digitize and tag linguistic data from the German variety of Cimbrian - spoken in three areas of northern Italy: Giazza (VR), Luserna (TN), and Roana (VI) - and to make available on-line a valuable and innovative linguistic resource for the in-depth study of Cimbrian. The task is addressed by a multidisciplinary team of linguists and computer scientists who, combining their competence, aim to make available new tools for linguistic analysis

2008

pdf bib
From Research to Application in Multilingual Information Access: the Contribution of Evaluation
Carol Peters | Martin Braschler | Giorgio Di Nunzio | Nicola Ferro | Julio Gonzalo | Mark Sanderson
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The importance of evaluation in promoting research and development in the information retrieval and natural language processing domains has long been recognised but is this sufficient? In many areas there is still a considerable gap between the results achieved by the research community and their implementation in commercial applications. This is particularly true for the cross-language or multilingual retrieval areas. Despite the strong demand for and interest in multilingual IR functionality, there are still very few operational systems on offer. The Cross Language Evaluation Forum (CLEF) is now taking steps aimed at changing this situation. The paper provides a critical assessment of the main results achieved by CLEF so far and discusses plans now underway to extend its activities in order to have a more direct impact on the application sector.

pdf bib
An Evaluation Resource for Geographic Information Retrieval
Thomas Mandl | Fredric Gey | Giorgio Di Nunzio | Nicola Ferro | Mark Sanderson | Diana Santos | Christa Womser-Hacker
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we present an evaluation resource for geographic information retrieval developed within the Cross Language Evaluation Forum (CLEF). The GeoCLEF track is dedicated to the evaluation of geographic information retrieval systems. The resource encompasses more than 600,000 documents, 75 topics so far, and more than 100,000 relevance judgments for these topics. Geographic information retrieval requires an evaluation resource which represents realistic information needs and which is geographically challenging. Some experimental results and analysis are reported