Francesco Ronzano

2018

This paper describes the results of the first Shared Task on Multilingual Emoji Prediction, organized as part of SemEval 2018. Given the text of a tweet, the task consists of predicting the most likely emoji to be used along such tweet. Two subtasks were proposed, one for English and one for Spanish, and participants were allowed to submit a system run to one or both subtasks. In total, 49 teams participated to the English subtask and 22 teams submitted a system run to the Spanish subtask. Evaluation was carried out emoji-wise, and the final ranking was based on macro F-Score. Data and further information about this task can be found at https://competitions.codalab.org/competitions/17344.

pdf bib abs
Multimodal Emoji Prediction
Francesco Barbieri | Miguel Ballesteros | Francesco Ronzano | Horacio Saggion
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Emojis are small images that are commonly included in social media text messages. The combination of visual and textual content in the same message builds up a modern way of communication, that automatic systems are not used to deal with. In this paper we extend recent advances in emoji prediction by putting forward a multimodal approach that is able to predict emojis in Instagram posts. Instagram posts are composed of pictures together with texts which sometimes include emojis. We show that these emojis can be predicted by using the text, but also using the picture. Our main finding is that incorporating the two synergistic modalities, in a combined model, improves accuracy in an emoji prediction task. This result demonstrates that these two modalities (text and images) encode different information on the use of emojis and therefore can complement each other.

pdf bib
PDFdigest: an Adaptable Layout-Aware PDF-to-XML Textual Content Extractor for Scientific Articles
Daniel Ferrés | Horacio Saggion | Francesco Ronzano | Àlex Bravo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib abs
A Multi-Layered Annotated Corpus of Scientific Papers
Beatriz Fisas | Francesco Ronzano | Horacio Saggion
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Scientific literature records the research process with a standardized structure and provides the clues to track the progress in a scientific field. Understanding its internal structure and content is of paramount importance for natural language processing (NLP) technologies. To meet this requirement, we have developed a multi-layered annotated corpus of scientific papers in the domain of Computer Graphics. Sentences are annotated with respect to their role in the argumentative structure of the discourse. The purpose of each citation is specified. Special features of the scientific discourse such as advantages and disadvantages are identified. In addition, a grade is allocated to each sentence according to its relevance for being included in a summary. To the best of our knowledge, this complex, multi-layered collection of annotations and metadata characterizing a set of research papers had never been grouped together before in one corpus and therefore constitutes a newer, richer resource with respect to those currently available in the field.

pdf bib abs
What does this Emoji Mean? A Vector Space Skip-Gram Model for Twitter Emojis
Francesco Barbieri | Francesco Ronzano | Horacio Saggion
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Emojis allow us to describe objects, situations and even feelings with small images, providing a visual and quick way to communicate. In this paper, we analyse emojis used in Twitter with distributional semantic models. We retrieve 10 millions tweets posted by USA users, and we build several skip gram word embedding models by mapping in the same vectorial space both words and emojis. We test our models with semantic similarity experiments, comparing the output of our models with human assessment. We also carry out an exhaustive qualitative evaluation, showing interesting results.

pdf bib
Making Sense of Massive Amounts of Scientific Publications: the Scientific Knowledge Miner Project
Francesco Ronzano | Ana Freire | Diego Saez-Trumper | Horacio Saggion
Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)

pdf bib
Trainable Citation-enhanced Summarization of Scientific Articles
Horacio Saggion | Ahmed AbuRa’ed | Francesco Ronzano
Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL)

pdf bib abs
Natural Language Processing for Intelligent Access to Scientific Information
Horacio Saggion | Francesco Ronzano
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Tutorial Abstracts

During the last decade the amount of scientific information available on-line increased at an unprecedented rate. As a consequence, nowadays researchers are overwhelmed by an enormous and continuously growing number of articles to consider when they perform research activities like the exploration of advances in specific topics, peer reviewing, writing and evaluation of proposals. Natural Language Processing Technology represents a key enabling factor in providing scientists with intelligent patterns to access to scientific information. Extracting information from scientific papers, for example, can contribute to the development of rich scientific knowledge bases which can be leveraged to support intelligent knowledge access and question answering. Summarization techniques can reduce the size of long papers to their essential content or automatically generate state-of-the-art-reviews. Paraphrase or textual entailment techniques can contribute to the identification of relations across different scientific textual sources. This tutorial provides an overview of the most relevant tasks related to the processing of scientific documents, including but not limited to the in-depth analysis of the structure of the scientific articles, their semantic interpretation, content extraction and summarization.

pdf bib
TALN at SemEval-2016 Task 11: Modelling Complex Words by Contextual, Lexical and Semantic Features
Francesco Ronzano | Ahmed Abura’ed | Luis Espinosa-Anke | Horacio Saggion
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
TALN at SemEval-2016 Task 14: Semantic Taxonomy Enrichment Via Sense-Based Embeddings
Luis Espinosa-Anke | Francesco Ronzano | Horacio Saggion
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)