Nataša Milić-Frayling

Also published as: Natasa Milic-Frayling


2024

pdf bib
LAraBench: Benchmarking Arabic AI with Large Language Models
Ahmed Abdelali | Hamdy Mubarak | Shammur Chowdhury | Maram Hasanain | Basel Mousi | Sabri Boughorbel | Samir Abdaljalil | Yassine El Kheir | Daniel Izham | Fahim Dalvi | Majd Hawasly | Nizi Nazar | Youssef Elshahawy | Ahmed Ali | Nadir Durrani | Natasa Milic-Frayling | Firoj Alam
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advancements in Large Language Models (LLMs) have significantly influenced the landscape of language and speech research. Despite this progress, these models lack specific benchmarking against state-of-the-art (SOTA) models tailored to particular languages and tasks. LAraBench addresses this gap for Arabic Natural Language Processing (NLP) and Speech Processing tasks, including sequence tagging and content classification across different domains. We utilized models such as GPT-3.5-turbo, GPT-4, BLOOMZ, Jais-13b-chat, Whisper, and USM, employing zero and few-shot learning techniques to tackle 33 distinct tasks across 61 publicly available datasets. This involved 98 experimental setups, encompassing ~296K data points, ~46 hours of speech, and 30 sentences for Text-to-Speech (TTS). This effort resulted in 330+ sets of experiments. Our analysis focused on measuring the performance gap between SOTA models and LLMs. The overarching trend observed was that SOTA models generally outperformed LLMs in zero-shot learning, with a few exceptions. Notably, larger computational models with few-shot learning techniques managed to reduce these performance gaps. Our findings provide valuable insights into the applicability of LLMs for Arabic NLP and speech processing tasks.

2023

pdf bib
Contextual Knowledge Learning for Dialogue Generation
Wen Zheng | Natasa Milic-Frayling | Ke Zhou
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Incorporating conversational context and knowledge into dialogue generation models has been essential for improving the quality of the generated responses. The context, comprising utterances from previous dialogue exchanges, is used as a source of content for response generation and as a means of selecting external knowledge. However, to avoid introducing irrelevant content, it is key to enable fine-grained scoring of context and knowledge. In this paper, we present a novel approach to context and knowledge weighting as an integral part of model training. We guide the model training through a Contextual Knowledge Learning (CKL) process which involves Latent Vectors for context and knowledge, respectively. CKL Latent Vectors capture the relationship between context, knowledge, and responses through weak supervision and enable differential weighting of context utterances and knowledge sentences during the training process. Experiments with two standard datasets and human evaluation demonstrate that CKL leads to a significant improvement compared with the performance of six strong baseline models and shows robustness with regard to reduced sizes of training sets.

2021

pdf bib
Knowledge-Grounded Dialogue Generation with Term-level De-noising
Wen Zheng | Natasa Milic-Frayling | Ke Zhou
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib
Approximation of Response Knowledge Retrieval in Knowledge-grounded Dialogue Generation
Wen Zheng | Natasa Milic-Frayling | Ke Zhou
Findings of the Association for Computational Linguistics: EMNLP 2020

This paper is concerned with improving dialogue generation models through injection of knowledge, e.g., content relevant to the post that can increase the quality of responses. Past research extends the training of the generative models by incorporating statistical properties of posts, responses and related knowledge, without explicitly assessing the knowledge quality. In our work, we demonstrate the importance of knowledge relevance and adopt a two-phase approach. We first apply a novel method, Transformer & Post based Posterior Approximation (TPPA) to select knowledge, and then use the Transformer with Expanded Decoder (TED) model to generate responses from both the post and the knowledge. TPPA method processes posts, post related knowledge, and response related knowledge at both word and sentence level. Our experiments with the TED generative model demonstrate the effectiveness of TPPA as it outperforms a set of strong baseline models. Our TPPA method is extendable and supports further optimization of knowledge retrieval and injection.

2017

pdf bib
Two Layers of Annotation for Representing Event Mentions in News Stories
Maria Pia di Buono | Martin Tutek | Jan Šnajder | Goran Glavaš | Bojana Dalbelo Bašić | Nataša Milić-Frayling
Proceedings of the 11th Linguistic Annotation Workshop

In this paper, we describe our preliminary study on annotating event mention as a part of our research on high-precision news event extraction models. To this end, we propose a two-layer annotation scheme, designed to separately capture the functional and conceptual aspects of event mentions. We hypothesize that the precision of models can be improved by modeling and extracting separately the different aspects of news events, and then combining the extracted information by leveraging the complementarities of the models. In addition, we carry out a preliminary annotation using the proposed scheme and analyze the annotation quality in terms of inter-annotator agreement.

pdf bib
Predicting News Values from Headline Text and Emotions
Maria Pia di Buono | Jan Šnajder | Bojana Dalbelo Bašić | Goran Glavaš | Martin Tutek | Natasa Milic-Frayling
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

We present a preliminary study on predicting news values from headline text and emotions. We perform a multivariate analysis on a dataset manually annotated with news values and emotions, discovering interesting correlations among them. We then train two competitive machine learning models – an SVM and a CNN – to predict news values from headline text and emotions as features. We find that, while both models yield a satisfactory performance, some news values are more difficult to detect than others, while some profit more from including emotion information.

2006

pdf bib
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Demonstrations
Alex Rudnicky | John Dowding | Natasa Milic-Frayling
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Demonstrations