János Zsibrita


2014

pdf bib
Szeged Corpus 2.5: Morphological Modifications in a Manually POS-tagged Hungarian Corpus
Veronika Vincze | Viktor Varga | Katalin Ilona Simkó | János Zsibrita | Ágoston Nagy | Richárd Farkas | János Csirik
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The Szeged Corpus is the largest manually annotated database containing the possible morphological analyses and lemmas for each word form. In this work, we present its latest version, Szeged Corpus 2.5, in which the new harmonized morphological coding system of Hungarian has been employed and, on the other hand, the majority of misspelled words have been corrected and tagged with the proper morphological code. New morphological codes are introduced for participles, causative / modal / frequentative verbs, adverbial pronouns and punctuation marks, moreover, the distinction between common and proper nouns is eliminated. We also report some statistical data on the frequency of the new morphological codes. The new version of the corpus made it possible to train magyarlanc, a data-driven POS-tagger of Hungarian on a dataset with the new harmonized codes. According to the results, magyarlanc is able to achieve a state-of-the-art accuracy score on the 2.5 version as well.

pdf bib
Automatic Error Detection concerning the Definite and Indefinite Conjugation in the HunLearner Corpus
Veronika Vincze | János Zsibrita | Péter Durst | Martina Katalin Szabó
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present the results of automatic error detection, concerning the definite and indefinite conjugation in the extended version of the HunLearner corpus, the learners’ corpus of the Hungarian language. We present the most typical structures that trigger definite or indefinite conjugation in Hungarian and we also discuss the most frequent types of errors made by language learners in the corpus texts. We also illustrate the error types with sentences taken from the corpus. Our results highlight grammatical structures that might pose problems for learners of Hungarian, which can be fruitfully applied in the teaching and practicing of such constructions from the language teacher’s or learners’ point of view. On the other hand, these results may be exploited in extending the functionalities of a grammar checker, concerning the definiteness of the verb. Our automatic system was able to achieve perfect recall, i.e. it could find all the mismatches between the type of the object and the conjugation of the verb, which is promising for future studies in this area.

2013

pdf bib
magyarlanc: A Tool for Morphological and Dependency Parsing of Hungarian
János Zsibrita | Veronika Vincze | Richárd Farkas
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf bib
Dependency Parsing for Identifying Hungarian Light Verb Constructions
Veronika Vincze | János Zsibrita | István Nagy T.
Proceedings of the Sixth International Joint Conference on Natural Language Processing