Georgios Petasis

2023

pdf bib abs
Andronicus of Rhodes at SemEval-2023 Task 4: Transformer-Based Human Value Detection Using Four Different Neural Network Architectures
Georgios Papadopoulos | Marko Kokol | Maria Dagioglou | Georgios Petasis
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents our participation to the “Human Value Detection shared task (Kiesel et al., 2023), as “Andronicus of Rhodes. We describe the approaches behind each entry in the official evaluation, along with the motivation behind each approach. Our best-performing approach has been based on BERT large, with 4 classification heads, implementing two different classification approaches (with different activation and loss functions), and two different partitioning of the training data, to handle class imbalance. Classification is performed through majority voting. The proposed approach outperforms the BERT baseline, ranking in the upper half of the competition.

2022

pdf bib abs
The Ellogon Web Annotation Tool: Annotating Moral Values and Arguments
Alexandros Fotios Ntogramatzis | Anna Gradou | Georgios Petasis | Marko Kokol
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In this paper, we present the Ellogon Web Annotation Tool. It is a collaborative, web-based annotation tool built upon the Ellogon infrastructure offering an improved user experience and adaptability to various annotation scenarios by making good use of the latest design practices and web development frameworks. Being in development for many years, this paper describes its current architecture, along with the recent modifications that extend the existing functionalities and the new features that were added. The new version of the tool offers document analytics, annotation inspection and comparison features, a modern UI, and formatted text import (e.g. TEI XML documents, rendered with simple markup). We present two use cases that serve as two examples of different annotation scenarios to demonstrate the new functionalities. An appropriate (user-supplied, XML-based) annotation schema is used for each scenario. The first schema contains the relevant components for representing concepts, moral values, and ideas. The second includes all the necessary elements for annotating argumentative units in a document and their binary relations.

2020

pdf bib abs
Social Web Observatory: A Platform and Method for Gathering Knowledge on Entities from Different Textual Sources
Leonidas Tsekouras | Georgios Petasis | George Giannakopoulos | Aris Kosmopoulos
Proceedings of the Twelfth Language Resources and Evaluation Conference

Within this work we describe a framework for the collection and summarization of information from the Web in an entity-driven manner. The framework consists of a set of appropriate workflows and the Social Web Observatory platform, which implements those workflows, supporting them through a language analysis pipeline. The pipeline includes text collection/crawling, identification of different entities, clustering of texts into events related to entities, entity-centric sentiment analysis, but also text analytics and visualization functionalities. The latter allow the user to take advantage of the gathered information as actionable knowledge: to understand the dynamics of the public opinion for a given entity over time and across real-world events. We describe the platform and the analysis functionality and evaluate the performance of the system, by allowing human users to score how the system fares in its intended purpose of summarizing entity-centered information from different sources in the Web.

pdf bib abs
Ellogon Casual Annotation Infrastructure
Georgios Petasis | Leonidas Tsekouras
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper presents a new annotation paradigm, casual annotation, along with a proposed architecture and a reference implementation, the Ellogon Casual Annotation Tool, which implements this paradigm and architecture. The novel aspects of the proposed paradigm originate from the vision to tightly integrate annotation with the casual, everyday activities of users. Annotating in a less “controlled” environment, and removing the bottleneck of selecting content and importing it to annotation infrastructures, casual annotation provides the ability to vastly increase the content that can be annotated and ease the annotation process through automatic pre-training. The proposed paradigm, architecture and reference implementation has been evaluated for more than two years on an annotation task related to sentiment analysis. Evaluation results suggest that, at least for this annotation task, there is a huge improvement in productivity after casual annotation adoption, in comparison to the more traditional annotation paradigms followed in the early stages of the annotation task.

2019

pdf bib abs
Segmentation of Argumentative Texts with Contextualised Word Representations
Georgios Petasis
Proceedings of the 6th Workshop on Argument Mining

The segmentation of argumentative units is an important subtask of argument mining, which is frequently addressed at a coarse granularity, usually assuming argumentative units to be no smaller than sentences. Approaches focusing at the clause-level granularity, typically address the task as sequence labeling at the token level, aiming to classify whether a token begins, is inside, or is outside of an argumentative unit. Most approaches exploit highly engineered, manually constructed features, and algorithms typically used in sequential tagging – such as Conditional Random Fields, while more recent approaches try to exploit manually constructed features in the context of deep neural networks. In this context, we examined to what extend recent advances in sequential labelling allow to reduce the need for highly sophisticated, manually constructed features, and whether limiting features to embeddings, pre-trained on large corpora is a promising approach. Evaluation results suggest the examined models and approaches can exhibit comparable performance, minimising the need for feature engineering.

pdf bib abs
Social Web Observatory: An entity-driven, holistic information summarization platform across sources
Leonidas Tsekouras | Georgios Petasis | Aris Kosmopoulos
Proceedings of the Workshop MultiLing 2019: Summarization Across Languages, Genres and Sources

The Social Web Observatory is an entity-driven, sentiment-aware, event summarization web platform, combining various methods and tools to overview trends across social media and news sources in Greek. SWO crawls, clusters and summarizes information following an entity-centric view of text streams, allowing to monitor the public sentiment towards a specific person, organization or other entity. In this paper, we overview the platform, outline the analysis pipeline and describe a user study aimed to quantify the usefulness of the system and especially the meaningfulness and coherence of discovered events.

2017

pdf bib abs
Unsupervised Detection of Argumentative Units though Topic Modeling Techniques
Alfio Ferrara | Stefano Montanelli | Georgios Petasis
Proceedings of the 4th Workshop on Argument Mining

In this paper we present a new unsupervised approach, “Attraction to Topics” – A2T , for the detection of argumentative units, a sub-task of argument mining. Motivated by the importance of topic identification in manual annotation, we examine whether topic modeling can be used for performing unsupervised detection of argumentative sentences, and to what extend topic modeling can be used to classify sentences as claims and premises. Preliminary evaluation results suggest that topic information can be successfully used for the detection of argumentative sentences, at least for corpora used for evaluation. Our approach has been evaluated on two English corpora, the first of which contains 90 persuasive essays, while the second is a collection of 340 documents from user generated content.

2016

pdf bib
Identifying Argument Components through TextRank
Georgios Petasis | Vangelis Karkaletsis
Proceedings of the Third Workshop on Argument Mining (ArgMining2016)

pdf bib abs
CLARIN-EL Web-based Annotation Tool
Ioannis Manousos Katakis | Georgios Petasis | Vangelis Karkaletsis
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents a new Web-based annotation tool, the “CLARIN-EL Web-based Annotation Tool”. Based on an existing annotation infrastructure offered by the “Ellogon” language enginneering platform, this new tool transfers a large part of Ellogon’s features and functionalities to a Web environment, by exploiting the capabilities of cloud computing. This new annotation tool is able to support a wide range of annotation tasks, through user provided annotation schemas in XML. The new annotation tool has already been employed in several annotation tasks, including the anotation of arguments, which is presented as a use case. The CLARIN-EL annotation tool is compared to existing solutions along several dimensions and features. Finally, future work includes the improvement of integration with the CLARIN-EL infrastructure, and the inclusion of features not currently supported, such as the annotation of aligned documents.

2015

pdf bib
Argument Extraction from News
Christos Sardianos | Ioannis Manousos Katakis | Georgios Petasis | Vangelis Karkaletsis
Proceedings of the 2nd Workshop on Argumentation Mining

2014

pdf bib abs
The Ellogon Pattern Engine: Context-free Grammars over Annotations
Georgios Petasis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents the pattern engine that is offered by the Ellogon language engineering platform. This pattern engine allows the application of context-free grammars over annotations, which are metadata generated during the processing of documents by natural language tools. In addition, grammar development is aided by a graphical grammar editor, giving grammar authors the capability to test and debug grammars.

pdf bib abs
Annotating Arguments: The NOMAD Collaborative Annotation Tool
Georgios Petasis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The huge amount of the available information in the Web creates the need for effective information extraction systems that are able to produce metadata that satisfy user’s information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn or evaluate extraction models. The production of such corpora can be significantly facilitated by annotation tools, which provide user-friendly facilities and enable annotators to annotate documents according to a predefined annotation schema. However, the construction of annotation tools that operate in a distributed environment is a challenging task: the majority of these tools are implemented as Web applications, having to cope with the capabilities offered by browsers. This paper describes the NOMAD collaborative annotation tool, which implements an alternative architecture: it remains a desktop application, fully exploiting the advantages of desktop applications, but provides collaborative annotation through the use of a centralised server for storing both the documents and their metadata, and instance messaging protocols for communicating events among all annotators. The annotation tool is implemented as a component of the Ellogon language engineering platform, exploiting its extensive annotation engine, its cross-platform abilities and its linguistic processing components, if such a need arises. Finally, the NOMAD annotation tool is distributed with an open source license, as part of the Ellogon platform.

pdf bib abs
NOMAD: Linguistic Resources and Tools Aimed at Policy Formulation and Validation
George Kiomourtzis | George Giannakopoulos | Georgios Petasis | Pythagoras Karampiperis | Vangelis Karkaletsis
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The NOMAD project (Policy Formulation and Validation through non Moderated Crowd-sourcing) is a project that supports policy making, by providing rich, actionable information related to how citizens perceive different policies. NOMAD automatically analyzes citizen contributions to the informal web (e.g. forums, social networks, blogs, newsgroups and wikis) using a variety of tools. These tools comprise text retrieval, topic classification, argument detection and sentiment analysis, as well as argument summarization. NOMAD provides decision-makers with a full arsenal of solutions starting from describing a domain and a policy to applying content search and acquisition, categorization and visualization. These solutions work in a collaborative manner in the policy-making arena. NOMAD, thus, embeds editing, analysis and visualization technologies into a concrete framework, applicable in a variety of policy-making and decision support settings In this paper we provide an overview of the linguistic tools and resources of NOMAD.

2012

pdf bib abs
The SYNC3 Collaborative Annotation Tool
Georgios Petasis
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The huge amount of the available information in the Web creates the need of effective information extraction systems that are able to produce metadata that satisfy user's information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn or evaluate extraction models. The production of such corpora can be significantly facilitated by annotation tools, that provide user-friendly facilities and enable annotators to annotate documents according to a predefined annotation schema. However, the construction of annotation tools that operate in a distributed environment is a challenging task: the majority of these tools are implemented as Web applications, having to cope with the capabilities offered by browsers. This paper describes the SYNC3 collaborative annotation tool, which implements an alternative architecture: it remains a desktop application, fully exploiting the advantages of desktop applications, but provides collaborative annotation through the use of a centralised server for storing both the documents and their metadata, and instance messaging protocols for communicating events among all annotators. The annotation tool is implemented as a component of the Ellogon language engineering platform, exploiting its extensive annotation engine, its cross-platform abilities and its linguistic processing components, if such a need arises. Finally, the SYNC3 annotation tool is distributed with an open source license, as part of the Ellogon platform.

2011

pdf bib
Coreference Annotator - A new annotation tool for aligned bilingual corpora
Mara Tsoumari | Georgios Petasis
Proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora

pdf bib
Unsupervised Domain Adaptation based on Text Relatedness
Georgios Petasis
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib abs
BlogBuster: A Tool for Extracting Corpora from the Blogosphere
Georgios Petasis | Dimitrios Petasis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, suitable for linguistic and language technology research and development, has attracted significant research interest recently. Several general purpose approaches for removing boilerplate have been presented in the literature; however the blogosphere poses additional requirements, such as a finer control over the extracted textual segments in order to accurately identify important elements, i.e. individual blog posts, titles, posting dates or comments. BlogBuster tries to provide such additional details along with boilerplate removal, following a rule-based approach. A small set of rules were manually constructed by observing a limited set of blogs from the Blogger and Wordpress hosting platforms. These rules operate on the DOM tree of an HTML page, as constructed by a popular browser, Mozilla Firefox. Evaluation results suggest that BlogBuster is very accurate when extracting corpora from blogs hosted in the Blogger and Wordpress, while exhibiting a reasonable precision when applied to blogs not hosted in these two popular blogging platforms.

2008

pdf bib abs
BOEMIE Ontology-Based Text Annotation Tool
Pavlina Fragkou | Georgios Petasis | Aris Theodorakos | Vangelis Karkaletsis | Constantine Spyropoulos
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The huge amount of the available information in the Web creates the need of effective information extraction systems that are able to produce metadata that satisfy users information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn extraction models. The production of such corpora can be significantly facilitated by annotation tools that are able to annotate, according to a defined ontology, not only named entities but most importantly relations between them. This paper describes the BOEMIE ontology-based annotation tool which is able to locate blocks of text that correspond to specific types of named entities, fill tables corresponding to ontology concepts with those named entities and link the filled tables based on relations defined in the domain ontology. Additionally, it can perform annotation of blocks of text that refer to the same topic. The tool has a user-friendly interface, supports automatic pre-annotation, annotation comparison as well as customization to other annotation schemata. The annotation tool has been used in a large scale annotation task involving 3,000 web pages regarding athletics. It has also been used in another annotation task involving 503 web pages with medical information, in different languages.