John Pavlopoulos


2024

pdf bib
Polarized Opinion Detection Improves the Detection of Toxic Language
John Pavlopoulos | Aristidis Likas
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Distance from unimodality (DFU) has been found to correlate well with human judgment for the assessment of polarized opinions. However, its un-normalized nature makes it less intuitive and somewhat difficult to exploit in machine learning (e.g., as a supervised signal). In this work a normalized version of this measure, called nDFU, is proposed that leads to better assessment of the degree of polarization. Then, we propose a methodology for K-class text classification, based on nDFU, that exploits polarized texts in the dataset. Such polarized instances are assigned to a separate K+1 class, so that a K+1-class classifier is trained. An empirical analysis on three datasets for abusive language detection, shows that nDFU can be used to model polarized annotations and prevent them from harming the classification performance. Finally, we further exploit nDFU to specify conditions that could explain polarization given a dimension and present text examples that polarized the annotators when the dimension was gender and race. Our code is available at https://github.com/ipavlopoulos/ndfu.

2023

pdf bib
Detecting Erroneously Recognized Handwritten Byzantine Text
John Pavlopoulos | Vasiliki Kougia | Paraskevi Platanou | Holger Essler
Findings of the Association for Computational Linguistics: EMNLP 2023

Handwritten text recognition (HTR) yields textual output that comprises errors, which are considerably more compared to that of recognised printed (OCRed) text. Post-correcting methods can eliminate such errors but may also introduce errors. In this study, we investigate the issues arising from this reality in Byzantine Greek. We investigate the properties of the texts that lead post-correction systems to this adversarial behaviour and we experiment with text classification systems that learn to detect incorrect recognition output. A large masked language model, pre-trained in modern and fine-tuned in Byzantine Greek, achieves an Average Precision score of 95%. The score improves to 97% when using a model that is pre-trained in modern and then in ancient Greek, the two language forms Byzantine Greek combines elements from. A century-based analysis shows that the advantage of the classifier that is further-pre-trained in ancient Greek concerns texts of older centuries. The application of this classifier before a neural post-corrector on HTRed text reduced significantly the post-correction mistakes.

pdf bib
Machine Learning for Ancient Languages: A Survey
Thea Sommerschield | Yannis Assael | John Pavlopoulos | Vanessa Stefanak | Andrew Senior | Chris Dyer | John Bodel | Jonathan Prag | Ion Androutsopoulos | Nando de Freitas
Computational Linguistics, Volume 49, Issue 3 - September 2023

Ancient languages preserve the cultures and histories of the past. However, their study is fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from deciphering lost languages to restoring damaged inscriptions, to determining the authorship of works of literature. Technological aids have long supported the study of ancient texts, but in recent years advances in artificial intelligence and machine learning have enabled analyses on a scale and in a detail that are reshaping the field of humanities, similarly to how microscopes and telescopes have contributed to the realm of science. This article aims to provide a comprehensive survey of published research using machine learning for the study of ancient texts written in any language, script, and medium, spanning over three and a half millennia of civilizations around the ancient world. To analyze the relevant literature, we introduce a taxonomy of tasks inspired by the steps involved in the study of ancient documents: digitization, restoration, attribution, linguistic analysis, textual criticism, translation, and decipherment. This work offers three major contributions: first, mapping the interdisciplinary field carved out by the synergy between the humanities and machine learning; second, highlighting how active collaboration between specialists from both fields is key to producing impactful and compelling scholarship; third, highlighting promising directions for future work in this field. Thus, this work promotes and supports the continued collaborative impetus between the humanities and machine learning.

pdf bib
JUAGE at SemEval-2023 Task 10: Parameter Efficient Classification
Jeffrey Sorensen | Katerina Korre | John Pavlopoulos | Katrin Tomanek | Nithum Thain | Lucas Dixon | Léo Laugier
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Using pre-trained language models to implement classifiers from small to modest amounts of training data is an area of active research. The ability of large language models to generalize from few-shot examples and to produce strong classifiers is extended using the engineering approach of parameter-efficient tuning. Using the Explainable Detection of Online Sexism (EDOS) training data and a small number of trainable weights to create a tuned prompt vector, a competitive model for this task was built, which was top-ranked in Subtask B.

pdf bib
Harmful Language Datasets: An Assessment of Robustness
Katerina Korre | John Pavlopoulos | Jeffrey Sorensen | Léo Laugier | Ion Androutsopoulos | Lucas Dixon | Alberto Barrón-cedeño
The 7th Workshop on Online Abuse and Harms (WOAH)

The automated detection of harmful language has been of great importance for the online world, especially with the growing importance of social media and, consequently, polarisation. There are many open challenges to high quality detection of harmful text, from dataset creation to generalisable application, thus calling for more systematic studies. In this paper, we explore re-annotation as a means of examining the robustness of already existing labelled datasets, showing that, despite using alternative definitions, the inter-annotator agreement remains very inconsistent, highlighting the intrinsically subjective and variable nature of the task. In addition, we build automatic toxicity detectors using the existing datasets, with their original labels, and we evaluate them on our multi-definition and multi-source datasets. Surprisingly, while other studies show that hate speech detection models perform better on data that are derived from the same distribution as the training set, our analysis demonstrates this is not necessarily true.

pdf bib
Dating Greek Papyri with Text Regression
John Pavlopoulos | Maria Konstantinidou | Isabelle Marthot-Santaniello | Holger Essler | Asimina Paparigopoulou
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Dating Greek papyri accurately is crucial not only to edit their texts but also to understand numerous other aspects of ancient writing, document and book production and circulation, as well as various other aspects of administration, everyday life and intellectual history of antiquity. Although a substantial number of Greek papyri documents bear a date or other conclusive data as to their chronological placement, an even larger number can only be dated tentatively or in approximation, due to the lack of decisive evidence. By creating a dataset of 389 transcriptions of documentary Greek papyri, we train 389 regression models and we predict a date for the papyri with an average MAE of 54 years and an MSE of 1.17, outperforming image classifiers and other baselines. Last, we release date estimations for 159 manuscripts, for which only the upper limit is known.

2022

pdf bib
Enriching Grammatical Error Correction Resources for Modern Greek
Katerina Korre | John Pavlopoulos
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Grammatical Error Correction (GEC), a task of Natural Language Processing (NLP), is challenging for underepresented languages. This issue is most prominent in languages other than English. This paper addresses the issue of data and system sparsity for GEC purposes in the modern Greek Language. Following the most popular current approaches in GEC, we develop and test an MT5 multilingual text-to-text transformer for Greek. To our knowledge this the first attempt to create a fully-fledged GEC model for Greek. Our evaluation shows that our system reaches up to 52.63% F0.5 score on part of the Greek Native Corpus (GNC), which is 16% below the winning system of the BEA-19 shared task on English GEC. In addition, we provide an extended version of the Greek Learner Corpus (GLC), on which our model reaches up to 22.76% F0.5. Previous versions did not include corrections with the annotations which hindered the potential development of efficient GEC systems. For that reason we provide a new set of corrections. This new dataset facilitates an exploration of the generalisation abilities and robustness of our system, given that the assessment is conducted on learner data while the training on native data.

pdf bib
A Study of Distant Viewing of ukiyo-e prints
Konstantina Liagkou | John Pavlopoulos | Ewa Machotka
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper contributes to studying relationships between Japanese topography and places featured in early modern landscape prints, so-called ukiyo-e or ‘pictures of the floating world’. The printed inscriptions on these images feature diverse place-names, both man-made and natural formations. However, due to the corpus’s richness and diversity, the precise nature of artistic mediation of the depicted places remains little understood. In this paper, we explored a new analytical approach based on the macroanalysis of images facilitated by Natural Language Processing technologies. This paper presents a small dataset with inscriptions on prints that have been annotated by an art historian for included place-name entities. Our dataset is released for public use. By fine-tuning and applying a Japanese BERT-based Name Entity Recogniser, we provide a use-case of a macroanalysis of a visual dataset that is hosted by the digital database of the Art Research Center at the Ritsumeikan University, Kyoto. Our work studies the relationship between topography and its visual renderings in early modern Japanese ukiyo-e landscape prints, demonstrating how an art historian’s work can be improved with Natural Language Processing toward distant viewing of visual datasets. We release our dataset and code for public use: https://github.com/connalia/ukiyo-e_meisho_nlp

pdf bib
Handwritten Paleographic Greek Text Recognition: A Century-Based Approach
Paraskevi Platanou | John Pavlopoulos | Georgios Papaioannou
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Today classicists are provided with a great number of digital tools which, in turn, offer possibilities for further study and new research goals. In this paper we explore the idea that old Greek handwriting can be machine-readable and consequently, researchers can study the target material fast and efficiently. Previous studies have shown that Handwritten Text Recognition (HTR) models are capable of attaining high accuracy rates. However, achieving high accuracy HTR results for Greek manuscripts is still considered to be a major challenge. The overall aim of this paper is to assess HTR for old Greek manuscripts. To address this statement, we study and use digitized images of the Oxford University Bodleian Library Greek manuscripts. By manually transcribing 77 images, we created and present here a new dataset for Handwritten Paleographic Greek Text Recognition. The dataset instances were organized by establishing as a leading factor the century to which the manuscript and hence the image belongs. Experimenting then with an HTR model we show that the error rate depends on the century of the image.

pdf bib
Sentiment Analysis of Homeric Text: The 1st Book of Iliad
John Pavlopoulos | Alexandros Xenos | Davide Picca
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Sentiment analysis studies are focused more on online customer reviews or social media, and less on literary studies. The problem is greater for ancient languages, where the linguistic expression of sentiments may diverge from modern linguistic forms. This work presents the outcome of a sentiment annotation task of the first Book of Iliad, an ancient Greek poem. The annotators were provided with verses translated into modern Greek and they annotated the perceived emotions and sentiments verse by verse. By estimating the fraction of annotators that found a verse as belonging to a specific sentiment class, we model the poem’s perceived sentiment as a multi-variate time series. By experimenting with a state of the art deep learning masked language model, pre-trained on modern Greek and fine-tuned to estimate the sentiment of our data, we registered a mean squared error of 0.063. This low error indicates that sentiment estimators built on our dataset can potentially be used as mechanical annotators, hence facilitating the distant reading of Homeric text. Our dataset is released for public use.

pdf bib
From the Detection of Toxic Spans in Online Discussions to the Analysis of Toxic-to-Civil Transfer
John Pavlopoulos | Leo Laugier | Alexandros Xenos | Jeffrey Sorensen | Ion Androutsopoulos
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We study the task of toxic spans detection, which concerns the detection of the spans that make a text toxic, when detecting such spans is possible. We introduce a dataset for this task, ToxicSpans, which we release publicly. By experimenting with several methods, we show that sequence labeling models perform best, but methods that add generic rationale extraction mechanisms on top of classifiers trained to predict if a post is toxic or not are also surprisingly promising. Finally, we use ToxicSpans and systems trained on it, to provide further analysis of state-of-the-art toxic to non-toxic transfer systems, as well as of human performance on that latter task. Our work highlights challenges in finer toxicity detection and mitigation.

2021

pdf bib
Civil Rephrases Of Toxic Texts With Self-Supervised Transformers
Léo Laugier | John Pavlopoulos | Jeffrey Sorensen | Lucas Dixon
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Platforms that support online commentary, from social networks to news sites, are increasingly leveraging machine learning to assist their moderation efforts. But this process does not typically provide feedback to the author that would help them contribute according to the community guidelines. This is prohibitively time-consuming for human moderators to do, and computational approaches are still nascent. This work focuses on models that can help suggest rephrasings of toxic comments in a more civil manner. Inspired by recent progress in unpaired sequence-to-sequence tasks, a self-supervised learning model is introduced, called CAE-T5. CAE-T5 employs a pre-trained text-to-text transformer, which is fine tuned with a denoising and cyclic auto-encoder loss. Experimenting with the largest toxicity detection dataset to date (Civil Comments) our model generates sentences that are more fluent and better at preserving the initial content compared to earlier text style transfer systems which we compare with using several scoring systems and human evaluation.

pdf bib
Context Sensitivity Estimation in Toxicity Detection
Alexandros Xenos | John Pavlopoulos | Ion Androutsopoulos
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

User posts whose perceived toxicity depends on the conversational context are rare in current toxicity detection datasets. Hence, toxicity detectors trained on current datasets will also disregard context, making the detection of context-sensitive toxicity a lot harder when it occurs. We constructed and publicly release a dataset of 10k posts with two kinds of toxicity labels per post, obtained from annotators who considered (i) both the current post and the previous one as context, or (ii) only the current post. We introduce a new task, context-sensitivity estimation, which aims to identify posts whose perceived toxicity changes if the context (previous post) is also considered. Using the new dataset, we show that systems can be developed for this task. Such systems could be used to enhance toxicity detection datasets with more context-dependent posts or to suggest when moderators should consider the parent posts, which may not always be necessary and may introduce additional costs.

pdf bib
Multimodal or Text? Retrieval or BERT? Benchmarking Classifiers for the Shared Task on Hateful Memes
Vasiliki Kougia | John Pavlopoulos
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021)

The Shared Task on Hateful Memes is a challenge that aims at the detection of hateful content in memes by inviting the implementation of systems that understand memes, potentially by combining image and textual information. The challenge consists of three detection tasks: hate, protected category and attack type. The first is a binary classification task, while the other two are multi-label classification tasks. Our participation included a text-based BERT baseline (TxtBERT), the same but adding information from the image (ImgBERT), and neural retrieval approaches. We also experimented with retrieval augmented classification models. We found that an ensemble of TxtBERT and ImgBERT achieves the best performance in terms of ROC AUC score in two out of the three tasks on our development set.

pdf bib
SemEval-2021 Task 5: Toxic Spans Detection
John Pavlopoulos | Jeffrey Sorensen | Léo Laugier | Ion Androutsopoulos
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

The Toxic Spans Detection task of SemEval-2021 required participants to predict the spans of toxic posts that were responsible for the toxic label of the posts. The task could be addressed as supervised sequence labeling, using training data with gold toxic spans provided by the organisers. It could also be treated as rationale extraction, using classifiers trained on potentially larger external datasets of posts manually annotated as toxic or not, without toxic span annotations. For the supervised sequence labeling approach and evaluation purposes, posts previously labeled as toxic were crowd-annotated for toxic spans. Participants submitted their predicted spans for a held-out test set and were scored using character-based F1. This overview summarises the work of the 36 teams that provided system descriptions.

pdf bib
ELERRANT: Automatic Grammatical Error Type Classification for Greek
Katerina Korre | Marita Chatzipanagiotou | John Pavlopoulos
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

In this paper, we introduce the Greek version of the automatic annotation tool ERRANT (Bryant et al., 2017), which we named ELERRANT. ERRANT functions as a rule-based error type classifier and was used as the main evaluation tool of the systems participating in the BEA-2019 (Bryant et al., 2019) shared task. Here, we discuss grammatical and morphological differences between English and Greek and how these differences affected the development of ELERRANT. We also introduce the first Greek Native Corpus (GNC) and the Greek WikiEdits Corpus (GWE), two new evaluation datasets with errors from native Greek learners and Wikipedia Talk Pages edits respectively. These two datasets are used for the evaluation of ELERRANT. This paper is a sole fragment of a bigger picture which illustrates the attempt to solve the problem of low-resource languages in NLP, in our case Greek.

2020

pdf bib
ERRANT: Assessing and Improving Grammatical Error Type Classification
Katerina Korre | John Pavlopoulos
Proceedings of the 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

Grammatical Error Correction (GEC) is the task of correcting different types of errors in written texts. To manage this task, large amounts of annotated data that contain erroneous sentences are required. This data, however, is usually annotated according to each annotator’s standards, making it difficult to manage multiple sets of data at the same time. The recently introduced Error Annotation Toolkit (ERRANT) tackled this problem by presenting a way to automatically annotate data that contain grammatical errors, while also providing a standardisation for annotation. ERRANT extracts the errors and classifies them into error types, in the form of an edit that can be used in the creation of GEC systems, as well as for grammatical error analysis. However, we observe that certain errors are falsely or ambiguously classified. This could obstruct any qualitative or quantitative grammatical error type analysis, as the results would be inaccurate. In this work, we use a sample of the FCE coprus (Yannakoudakis et al., 2011) for secondary error type annotation and we show that up to 39% of the annotations of the most frequent type should be re-classified. Our corrections will be publicly released, so that they can serve as the starting point of a broader, collaborative, ongoing correction process.

pdf bib
Toxicity Detection: Does Context Really Matter?
John Pavlopoulos | Jeffrey Sorensen | Lucas Dixon | Nithum Thain | Ion Androutsopoulos
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Moderation is crucial to promoting healthy online discussions. Although several ‘toxicity’ detection datasets and models have been published, most of them ignore the context of the posts, implicitly assuming that comments may be judged independently. We investigate this assumption by focusing on two questions: (a) does context affect the human judgement, and (b) does conditioning on context improve performance of toxicity detection systems? We experiment with Wikipedia conversations, limiting the notion of context to the previous post in the thread and the discussion title. We find that context can both amplify or mitigate the perceived toxicity of posts. Moreover, a small but significant subset of manually labeled posts (5% in one of our experiments) end up having the opposite toxicity labels if the annotators are not provided with context. Surprisingly, we also find no evidence that context actually improves the performance of toxicity classifiers, having tried a range of classifiers and mechanisms to make them context aware. This points to the need for larger datasets of comments annotated in context. We make our code and data publicly available.

2019

pdf bib
ConvAI at SemEval-2019 Task 6: Offensive Language Identification and Categorization with Perspective and BERT
John Pavlopoulos | Nithum Thain | Lucas Dixon | Ion Androutsopoulos
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper presents the application of two strong baseline systems for toxicity detection and evaluates their performance in identifying and categorizing offensive language in social media. PERSPECTIVE is an API, that serves multiple machine learning models for the improvement of conversations online, as well as a toxicity detection system, trained on a wide variety of comments from platforms across the Internet. BERT is a recently popular language representation model, fine tuned per task and achieving state of the art performance in multiple NLP tasks. PERSPECTIVE performed better than BERT in detecting toxicity, but BERT was much better in categorizing the offensive type. Both baselines were ranked surprisingly high in the SEMEVAL-2019 OFFENSEVAL competition, PERSPECTIVE in detecting an offensive post (12th) and BERT in categorizing it (11th). The main contribution of this paper is the assessment of two strong baselines for the identification (PERSPECTIVE) and the categorization (BERT) of offensive language with little or no additional training data.

pdf bib
A Survey on Biomedical Image Captioning
John Pavlopoulos | Vasiliki Kougia | Ion Androutsopoulos
Proceedings of the Second Workshop on Shortcomings in Vision and Language

Image captioning applied to biomedical images can assist and accelerate the diagnosis process followed by clinicians. This article is the first survey of biomedical image captioning, discussing datasets, evaluation measures, and state of the art methods. Additionally, we suggest two baselines, a weak and a stronger one; the latter outperforms all current state of the art systems on one of the datasets.

2017

pdf bib
Deep Learning for User Comment Moderation
John Pavlopoulos | Prodromos Malakasiotis | Ion Androutsopoulos
Proceedings of the First Workshop on Abusive Language Online

Experimenting with a new dataset of 1.6M user comments from a Greek news portal and existing datasets of EnglishWikipedia comments, we show that an RNN outperforms the previous state of the art in moderation. A deep, classification-specific attention mechanism improves further the overall performance of the RNN. We also compare against a CNN and a word-list baseline, considering both fully automatic and semi-automatic moderation.

pdf bib
Improved Abusive Comment Moderation with User Embeddings
John Pavlopoulos | Prodromos Malakasiotis | Juli Bakagianni | Ion Androutsopoulos
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

Experimenting with a dataset of approximately 1.6M user comments from a Greek news sports portal, we explore how a state of the art RNN-based moderation method can be improved by adding user embeddings, user type embeddings, user biases, or user type biases. We observe improvements in all cases, with user embeddings leading to the biggest performance gains.

pdf bib
Deeper Attention to Abusive User Content Moderation
John Pavlopoulos | Prodromos Malakasiotis | Ion Androutsopoulos
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Experimenting with a new dataset of 1.6M user comments from a news portal and an existing dataset of 115K Wikipedia talk page comments, we show that an RNN operating on word embeddings outpeforms the previous state of the art in moderation, which used logistic regression or an MLP classifier with character or word n-grams. We also compare against a CNN operating on word embeddings, and a word-list baseline. A novel, deep, classificationspecific attention mechanism improves the performance of the RNN further, and can also highlight suspicious words for free, without including highlighted words in the training data. We consider both fully automatic and semi-automatic moderation.

2016

pdf bib
aueb.twitter.sentiment at SemEval-2016 Task 4: A Weighted Ensemble of SVMs for Twitter Sentiment Analysis
Stavros Giorgis | Apostolos Rousas | John Pavlopoulos | Prodromos Malakasiotis | Ion Androutsopoulos
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
AUEB-ABSA at SemEval-2016 Task 5: Ensembles of Classifiers and Embeddings for Aspect Based Sentiment Analysis
Dionysios Xenos | Panagiotis Theodorakakos | John Pavlopoulos | Prodromos Malakasiotis | Ion Androutsopoulos
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

2014

pdf bib
Multi-Granular Aspect Aggregation in Aspect-Based Sentiment Analysis
John Pavlopoulos | Ion Androutsopoulos
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
A Vague Sense Classifier for Detecting Vague Definitions in Ontologies
Panos Alexopoulos | John Pavlopoulos
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
SemEval-2014 Task 4: Aspect Based Sentiment Analysis
Maria Pontiki | Dimitris Galanis | John Pavlopoulos | Harris Papageorgiou | Ion Androutsopoulos | Suresh Manandhar
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
AUEB: Two Stage Sentiment Analysis of Social Network Messages
Rafael Michael Karampatsis | John Pavlopoulos | Prodromos Malakasiotis
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
Aspect Term Extraction for Sentiment Analysis: New Datasets, New Evaluation Measures and an Improved Unsupervised Method
John Pavlopoulos | Ion Androutsopoulos
Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)

2013

pdf bib
nlp.cs.aueb.gr: Two Stage Sentiment Analysis
Prodromos Malakasiotis | Rafael Michael Karampatsis | Konstantina Makrynioti | John Pavlopoulos
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)