Yo Sato


2020

pdf bib
Dialect Clustering with Character-Based Metrics: in Search of the Boundary of Language and Dialect
Yo Sato | Kevin Heffernan
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present in this work a universal, character-based method for representing sentences so that one can thereby calculate the distance between any two sentence pair. With a small alphabet, it can function as a proxy of phonemes, and as one of its main uses, we carry out dialect clustering: cluster a dialect/sub-language mixed corpus into sub-groups and see if they coincide with the conventional boundaries of dialects and sub-languages. By using data with multiple Japanese dialects and multiple Slavic languages, we report how well each group clusters, in a manner to partially respond to the question of what separates languages from dialects.

pdf bib
Homonym normalisation by word sense clustering: a case in Japanese
Yo Sato | Kevin Heffernan
Proceedings of the 28th International Conference on Computational Linguistics

This work presents a method of word sense clustering that differentiates homonyms and merge homophones, taking Japanese as an example, where orthographical variation causes problem for language processing. It uses contextualised embeddings (BERT) to cluster tokens into distinct sense groups, and we use these groups to normalise synonymous instances to a single representative form. We see the benefit of this normalisation in language model, as well as in transliteration.

2018

pdf bib
Creating dialect sub-corpora by clustering: a case in Japanese for an adaptive method
Yo Sato | Kevin Heffernan
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2009

pdf bib
Incrementality, Speaker-Hearer Switching and the Disambiguation Challenge
Ruth Kempson | Eleni Gregoromichelaki | Yo Sato
Proceedings of SRSL 2009, the 2nd Workshop on Semantic Representation of Spoken Language

pdf bib
Dialogue Modelling and the Remit of Core Grammar
Eleni Gregoromichelaki | Yo Sato | Ruth Kempson | Andrew Gargett | Christine Howes
Proceedings of the Eight International Conference on Computational Semantics

2008

pdf bib
Lexicalised Parsing of German V2
Yo Sato
Proceedings of the Workshop on Parsing German

pdf bib
Parser Evaluation Across Frameworks without Format Conversion
Wai Lok Tam | Yo Sato | Yusuke Miyao | Junichi Tsujii
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation

2006

pdf bib
Lexicalising Word Order Constraints for Implemented Linearisation Grammar
Yo Sato
Student Research Workshop