Jyrki Niemi


2014

pdf bib
HFST-SweNER — A New NER Resource for Swedish
Dimitrios Kokkinakis | Jyrki Niemi | Sam Hardwick | Krister Lindén | Lars Borin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Named entity recognition (NER) is a knowledge-intensive information extraction task that is used for recognizing textual mentions of entities that belong to a predefined set of categories, such as locations, organizations and time expressions. NER is a challenging, difficult, yet essential preprocessing technology for many natural language processing applications, and particularly crucial for language understanding. NER has been actively explored in academia and in industry especially during the last years due to the advent of social media data. This paper describes the conversion, modeling and adaptation of a Swedish NER system from a hybrid environment, with integrated functionality from various processing components, to the Helsinki Finite-State Transducer Technology (HFST) platform. This new HFST-based NER (HFST-SweNER) is a full-fledged open source implementation that supports a variety of generic named entity types and consists of multiple, reusable resource layers, e.g., various n-gram-based named entity lists (gazetteers).

2013

pdf bib
Nordic and Baltic Wordnets Aligned and Compared through “WordTies”
Bolette Sandford Pedersen | Lars Borin | Markus Forsberg | Neeme Kahusk | Krister Lindén | Jyrki Niemi | Niklas Nisbeth | Lars Nygaard | Heili Orav | Eirikur Rögnvaldsson | Mitchell Seaton | Kadri Vider | Kaarlo Voionmaa
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013)

2012

pdf bib
Representing the Translation Relation in a Bilingual Wordnet
Jyrki Niemi | Krister Lindén
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes representing translations in the Finnish wordnet, FinnWordNet (FiWN), and constructing the FiWN database. FiWN was created by translating all the word senses of the Princeton WordNet (PWN) into Finnish and by joining the translations with the semantic and lexical relations of PWN extracted into a relational (database) format. The approach naturally resulted in a translation relation between PWN and FiWN. Unlike many other multilingual wordnets, the translation relation in FiWN is not primarily on the synset level, but on the level of an individual word sense, which allows more precise translation correspondences. This can easily be projected into a synset-level translation relation, used for linking with other wordnets, for example, via Core WordNet. Synset-level translations are also used as a default in the absence of word-sense translations. The FiWN data in the relational database can be converted to other formats. In the PWN database format, translations are attached to source-language words, allowing the implementation of a Web search interface also working as a bilingual dictionary. Another representation encodes the translation relation as a finite-state transducer.

2008

pdf bib
Quantification and Implication in Semantic Calendar Expressions Represented with Finite-State Transducers
Jyrki Niemi | Kimmo Koskenniemi
Coling 2008: Companion volume: Posters

2007

pdf bib
Representing Calendar Expressions with Finite-State Transducers that Bracket Periods of Time on a Hierachical Timeline
Jyrki Niemi | Kimmo Koskenniemi
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

2006

pdf bib
Towards modeling the semantics of calendar expressions as extended regular expressions
Jyrki Niemi | Lauri Carlson
Proceedings of the 15th Nordic Conference of Computational Linguistics (NODALIDA 2005)