Gakuto Kurata


2023

pdf bib
Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries
Ashish Mittal | Sunita Sarawagi | Preethi Jyothi | George Saon | Gakuto Kurata
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Despite the impressive performance of ASR models on mainstream benchmarks, their performance on rare words is unsatisfactory. In enterprise settings, often a focused list of entities (such as locations, names, etc) are available which can be used to adapt the model to the terminology of specific domains. In this paper, we present a novel inference algorithm that improves the prediction of state-of-the-art ASR models using nearest-neighbor-based matching on an inference-time word list. We consider both the Transducer architecture that is useful in the streaming setting, and state-of-the-art encoder-decoder models such as Whisper. In our approach, a list of rare entities is indexed in a memory by synthesizing speech for each entry, and then storing the internal acoustic and language model states obtained from the best possible alignment on the ASR model. The memory is organized as a trie which we harness to perform a stateful lookup during inference. A key property of our extension is that we prevent spurious matches by restricting to only word-level matches. In our experiments on publicly available datasets and private benchmarks, we show that our method is effective in significantly improving rare word recognition.

2016

pdf bib
Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence
Gakuto Kurata | Bing Xiang | Bowen Zhou
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling
Gakuto Kurata | Bing Xiang | Bowen Zhou | Mo Yu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2006

pdf bib
Phoneme-to-Text Transcription System with an Infinite Vocabulary
Shinsuke Mori | Daisuke Takuma | Gakuto Kurata
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics