Jonas Sjöbergh


2008

pdf bib
A Multi-Lingual Dictionary of Dirty Words
Jonas Sjöbergh | Kenji Araki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present a multi-lingual dictionary of dirty words. We have collected about 3,200 dirty words in several languages and built a database of these. The language with the most words in the database is English, though there are several hundred dirty words in for instance Japanese too. Words are classified into their general meaning, such as what part of the human anatomy they refer to. Words can also be assigned a nuance label to indicate if it is a cute word used when speaking to children, a very rude word, a clinical word etc. The database is available online and will hopefully be enlarged over time. It has already been used in research on for instance automatic joke generation and emotion detection.

pdf bib
What is poorly Said is a Little Funny
Jonas Sjöbergh | Kenji Araki
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We implement several different methods for generating jokes in English. The common theme is to intentionally produce poor utterances by breaking Grice’s maxims of conversation. The generated jokes are evaluated and compared to human made jokes. They are in general quite weak jokes, though there are a few high scoring jokes and many jokes that score higher than the most boring human joke.

pdf bib
A Complete and Modestly Funny System for Generating and Performing Japanese Stand-Up Comedy
Jonas Sjöbergh | Kenji Araki
Coling 2008: Companion volume: Posters

2007

pdf bib
Widening the HolSum Search Scope
Martin Duneld | Jonas Sjöbergh
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

pdf bib
Developing and Evaluating a Searchable Swedish-Thai Lexicon
Wanwisa Khanaraksombat | Jonas Sjöbergh
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

pdf bib
Recreating Humorous Split Compound Errors in Swedish by Using Grammaticality
Jonas Sjöbergh | Kenji Araki
Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007)

2006

pdf bib
Towards Holistic Summarization – Selecting Summaries, Not Sentences
Martin Hassel | Jonas Sjöbergh
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we present a novel method for automatic text summarization through text extraction, using computational semantics. The new idea is to view all the extracted text as a whole and compute a score for the total impact of the summary, instead of ranking for instance individual sentences. A greedy search strategy is used to search through the space of possible summaries and select the summary with the highest score of those found. The aim has been to construct a summarizer that can be quickly assembled, with the use of only a very few basic language tools, for languages that lack large amounts of structured or annotated data or advanced tools for linguistic processing. The proposed method is largely language independent, though we only evaluate it on English in this paper, using ROUGE-scores on texts from among others the DUC 2004 task 2. On this task our method performs better than several of the systems evaluated there, but worse than the best systems.

pdf bib
Chunking: an unsupervised method to find errors in text
Jonas Sjöbergh
Proceedings of the 15th Nordic Conference of Computational Linguistics (NODALIDA 2005)

2004

pdf bib
Finding the Correct Interpretation of Swedish Compounds, a Statistical Approach
Jonas Sjöbergh | Viggo Kann
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)