Michael Hanl


2016

pdf bib
KorAP Architecture ― Diving in the Deep Sea of Corpus Data
Nils Diewald | Michael Hanl | Eliza Margaretha | Joachim Bingel | Marc Kupietz | Piotr Bański | Andreas Witt
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

KorAP is a corpus search and analysis platform, developed at the Institute for the German Language (IDS). It supports very large corpora with multiple annotation layers, multiple query languages, and complex licensing scenarios. KorAP’s design aims to be scalable, flexible, and sustainable to serve the German Reference Corpus DeReKo for at least the next decade. To meet these requirements, we have adopted a highly modular microservice-based architecture. This paper outlines our approach: An architecture consisting of small components that are easy to extend, replace, and maintain. The components include a search backend, a user and corpus license management system, and a web-based user frontend. We also describe a general corpus query protocol used by all microservices for internal communications. KorAP is open source, licensed under BSD-2, and available on GitHub.

2014

pdf bib
Access control by query rewriting: the case of KorAP
Piotr Bański | Nils Diewald | Michael Hanl | Marc Kupietz | Andreas Witt
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present an approach to an aspect of managing complex access scenarios to large and heterogeneous corpora that involves handling user queries that, intentionally or due to the complexity of the queried resource, target texts or annotations outside of the given user’s permissions. We first outline the overall architecture of the corpus analysis platform KorAP, devoting some attention to the way in which it handles multiple query languages, by implementing ISO CQLF (Corpus Query Lingua Franca), which in turn constitutes a component crucial for the functionality discussed here. Next, we look at query rewriting as it is used by KorAP and zoom in on one kind of this procedure, namely the rewriting of queries that is forced by data access restrictions.