Ján Staš

Also published as: Jan Stas, Jan Staš


2023

pdf bib
Fine-Tuning and Evaluation of Question Generation for Slovak Language
Ondrej Megela | Daniel Hladek | Matus Pleva | Ján Staš | Ming-Hsiang Su | Yuan-Fu Liao
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)

2019

pdf bib
Sequence to Sequence Convolutional Neural Network for Automatic Spelling Correction
Daniel Hládek | Matúš Pleva | Ján Staš | Yuan-Fu Liao
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)

pdf bib
Building of children speech corpus for improving automatic subtitling services
Matus Pleva | Stanislav Ondas | Daniel Hládek | Jozef Juhar | Ján Staš | Yuan-Fu Liao
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)

2016

pdf bib
Evaluation Set for Slovak News Information Retrieval
Daniel Hládek | Jan Staš | Jozef Juhár
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This work proposes an information retrieval evaluation set for the Slovak language. A set of 80 queries written in the natural language is given together with the set of relevant documents. The document set contains 3980 newspaper articles sorted into 6 categories. Each document in the result set is manually annotated for relevancy with its corresponding query. The evaluation set is mostly compatible with the Cranfield test collection using the same methodology for queries and annotation of relevancy. In addition to that it provides annotation for document title, author, publication date and category that can be used for evaluation of automatic document clustering and categorization.

pdf bib
An Extension of the Slovak Broadcast News Corpus based on Semi-Automatic Annotation
Peter Viszlay | Ján Staš | Tomáš Koctúr | Martin Lojka | Jozef Juhár
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we introduce an extension of our previously released TUKE-BNews-SK corpus based on a semi-automatic annotation scheme. It firstly relies on the automatic transcription of the BN data performed by our Slovak large vocabulary continuous speech recognition system. The generated hypotheses are then manually corrected and completed by trained human annotators. The corpus is composed of 25 hours of fully-annotated spontaneous and prepared speech. In addition, we have acquired 900 hours of another BN data, part of which we plan to annotate semi-automatically. We present a preliminary corpus evaluation that gives very promising results.

2014

pdf bib
The Slovak Categorized News Corpus
Daniel Hladek | Jan Stas | Jozef Juhar
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The presented corpus aims to be the first attempt to create a representative sample of the contemporary Slovak language from various domains with easy searching and automated processing. This first version of the corpus contains words and automatic morphological and named entity annotations and transcriptions of abbreviations and numerals. Integral part of the proposed paper is a word boundary and sentence boundary detection algorithm that utilizes characteristic features of the language.