Iza Škrjanec


2023

pdf bib
Tackling Hallucinations in Neural Chart Summarization
Saad Obaid ul Islam | Iza Škrjanec | Ondrej Dusek | Vera Demberg
Proceedings of the 16th International Natural Language Generation Conference

Hallucinations in text generation occur when the system produces text that is not grounded in the input. In this work, we tackle the problem of hallucinations in neural chart summarization. Our analysis shows that the target side of chart summarization training datasets often contains additional information, leading to hallucinations. We propose a natural language inference (NLI) based method to preprocess the training data and show through human evaluation that our method significantly reduces hallucinations. We also found that shortening long-distance dependencies in the input sequence and adding chart-related information like title and legends improves the overall performance.

2022

pdf bib
Barch: an English Dataset of Bar Chart Summaries
Iza Škrjanec | Muhammad Salman Edhi | Vera Demberg
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present Barch, a new English dataset of human-written summaries describing bar charts. This dataset contains 47 charts based on a selection of 18 topics. Each chart is associated with one of the four intended messages expressed in the chart title. Using crowdsourcing, we collected around 20 summaries per chart, or one thousand in total. The text of the summaries is aligned with the chart data as well as with analytical inferences about the data drawn by humans. Our datasets is one of the first to explore the effect of intended messages on the data descriptions in chart summaries. Additionally, it lends itself well to the task of training data-driven systems for chart-to-text generation. We provide results on the performance of state-of-the-art neural generation models trained on this dataset and discuss the strengths and shortcomings of different models.

2021

pdf bib
Script Parsing with Hierarchical Sequence Modelling
Fangzhou Zhai | Iza Škrjanec | Alexander Koller
Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics

Scripts capture commonsense knowledge about everyday activities and their participants. Script knowledge proved useful in a number of NLP tasks, such as referent prediction, discourse classification, and story generation. A crucial step for the exploitation of script knowledge is script parsing, the task of tagging a text with the events and participants from a certain activity. This task is challenging: it requires information both about the ways events and participants are usually uttered in surface language as well as the order in which they occur in the world. We show how to do accurate script parsing with a hierarchical sequence model and transfer learning. Our model improves the state of the art of event parsing by over 16 points F-score and, for the first time, accurately tags script participants.

2017

pdf bib
Gender Profiling for Slovene Twitter communication: the Influence of Gender Marking, Content and Style
Ben Verhoeven | Iza Škrjanec | Senja Pollak
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing

We present results of the first gender classification experiments on Slovene text to our knowledge. Inspired by the TwiSty corpus and experiments (Verhoeven et al., 2016), we employed the Janes corpus (Erjavec et al., 2016) and its gender annotations to perform gender classification experiments on Twitter text comparing a token-based and a lemma-based approach. We find that the token-based approach (92.6% accuracy), containing gender markings related to the author, outperforms the lemma-based approach by about 5%. Especially in the lemmatized version, we also observe stylistic and content-based differences in writing between men (e.g. more profane language, numerals and beer mentions) and women (e.g. more pronouns, emoticons and character flooding). Many of our findings corroborate previous research on other languages.

2015

pdf bib
Predicting the Level of Text Standardness in User-generated Content
Nikola Ljubešić | Darja Fišer | Tomaž Erjavec | Jaka Čibej | Dafne Marko | Senja Pollak | Iza Škrjanec
Proceedings of the International Conference Recent Advances in Natural Language Processing