Jun Harashima


2020

pdf bib
Cookpad Parsed Corpus: Linguistic Annotations of Japanese Recipes
Jun Harashima | Makoto Hiramatsu
Proceedings of the 14th Linguistic Annotation Workshop

It has become increasingly common for people to share cooking recipes on the Internet. Along with the increase in the number of shared recipes, there have been corresponding increases in recipe-related studies and datasets. However, there are still few datasets that provide linguistic annotations for the recipe-related studies even though such annotations should form the basis of the studies. This paper introduces a novel recipe-related dataset, named Cookpad Parsed Corpus, which contains linguistic annotations for Japanese recipes. We randomly extracted 500 recipes from the largest recipe-related dataset, the Cookpad Recipe Dataset, and annotated 4; 738 sentences in the recipes with morphemes, named entities, and dependency relations. This paper also reports benchmark results on our corpus for Japanese morphological analysis, named entity recognition, and dependency parsing. We show that there is still room for improvement in the analyses of recipes.

pdf bib
Visual Grounding Annotation of Recipe Flow Graph
Taichi Nishimura | Suzushi Tomori | Hayato Hashimoto | Atsushi Hashimoto | Yoko Yamakata | Jun Harashima | Yoshitaka Ushiku | Shinsuke Mori
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we provide a dataset that gives visual grounding annotations to recipe flow graphs. A recipe flow graph is a representation of the cooking workflow, which is designed with the aim of understanding the workflow from natural language processing. Such a workflow will increase its value when grounded to real-world activities, and visual grounding is a way to do so. Visual grounding is provided as bounding boxes to image sequences of recipes, and each bounding box is linked to an element of the workflow. Because the workflows are also linked to the text, this annotation gives visual grounding with workflow’s contextual information between procedural text and visual observation in an indirect manner. We subsidiarily annotated two types of event attributes with each bounding box: “doing-the-action,” or “done-the-action”. As a result of the annotation, we got 2,300 bounding boxes in 272 flow graph recipes. Various experiments showed that the proposed dataset enables us to estimate contextual information described in recipe flow graphs from an image sequence.

pdf bib
Non-ingredient Detection in User-generated Recipes using the Sequence Tagging Approach
Yasuhiro Yamaguchi | Shintaro Inuzuka | Makoto Hiramatsu | Jun Harashima
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

Recently, the number of user-generated recipes on the Internet has increased. In such recipes, users are generally supposed to write a title, an ingredient list, and steps to create a dish. However, some items in an ingredient list in a user-generated recipe are not actually edible ingredients. For example, headings, comments, and kitchenware sometimes appear in an ingredient list because users can freely write the list in their recipes. Such noise makes it difficult for computers to use recipes for a variety of tasks, such as calorie estimation. To address this issue, we propose a non-ingredient detection method inspired by a neural sequence tagging model. In our experiment, we annotated 6,675 ingredients in 600 user-generated recipes and showed that our proposed method achieved a 93.3 F1 score.

2019

pdf bib
Real World Voice Assistant System for Cooking
Takahiko Ito | Shintaro Inuzuka | Yoshiaki Yamada | Jun Harashima
Proceedings of the 12th International Conference on Natural Language Generation

This study presents a voice assistant system to support cooking by utilizing smart speakers in Japan. This system not only speaks the procedures written in recipes point by point but also answers the common questions from users for the specified recipes. The system applies machine comprehension techniques to millions of recipes for answering the common questions in cooking such as “人参はどうしたらよいですか (How should I cook carrots?)”. Furthermore, numerous machine-learning techniques are applied to generate better responses to users.

2018

pdf bib
Step or Not: Discriminator for The Real Instructions in User-generated Recipes
Shintaro Inuzuka | Takahiko Ito | Jun Harashima
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

In a recipe sharing service, users publish recipe instructions in the form of a series of steps. However, some of the “steps” are not actually part of the cooking process. Specifically, advertisements of recipes themselves (e.g., “introduced on TV”) and comments (e.g., “Thanks for many messages”) may often be included in the step section of the recipe, like the recipe author’s communication tool. However, such fake steps can cause problems when using recipe search indexing or when being spoken by devices such as smart speakers. As presented in this talk, we have constructed a discriminator that distinguishes between such a fake step and the step actually used for cooking. This project includes, but is not limited to, the creation of annotation data by classifying and analyzing recipe steps and the construction of identification models. Our models use only text information to identify the step. In our test, machine learning models achieved higher accuracy than rule-based methods that use manually chosen clue words.

2016

pdf bib
Japanese Word―Color Associations with and without Contexts
Jun Harashima
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Although some words carry strong associations with specific colors (e.g., the word danger is associated with the color red), few studies have investigated these relationships. This may be due to the relative rarity of databases that contain large quantities of such information. Additionally, these resources are often limited to particular languages, such as English. Moreover, the existing resources often do not consider the possible contexts of words in assessing the associations between a word and a color. As a result, the influence of context on word―color associations is not fully understood. In this study, we constructed a novel language resource for word―color associations. The resource has two characteristics: First, our resource is the first to include Japanese word―color associations, which were collected via crowdsourcing. Second, the word―color associations in the resource are linked to contexts. We show that word―color associations depend on language and that associations with certain colors are affected by context information.

pdf bib
A Large-scale Recipe and Meal Data Collection as Infrastructure for Food Research
Jun Harashima | Michiaki Ariga | Kenta Murata | Masayuki Ioki
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Everyday meals are an important part of our daily lives and, currently, there are many Internet sites that help us plan these meals. Allied to the growth in the amount of food data such as recipes available on the Internet is an increase in the number of studies on these data, such as recipe analysis and recipe search. However, there are few publicly available resources for food research; those that do exist do not include a wide range of food data or any meal data (that is, likely combinations of recipes). In this study, we construct a large-scale recipe and meal data collection as the underlying infrastructure to promote food research. Our corpus consists of approximately 1.7 million recipes and 36000 meals in cookpad, one of the largest recipe sites in the world. We made the corpus available to researchers in February 2015 and as of February 2016, 82 research groups at 56 universities have made use of it to enhance their studies.

pdf bib
Japanese-English Machine Translation of Recipe Texts
Takayuki Sato | Jun Harashima | Mamoru Komachi
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

Concomitant with the globalization of food culture, demand for the recipes of specialty dishes has been increasing. The recent growth in recipe sharing websites and food blogs has resulted in numerous recipe texts being available for diverse foods in various languages. However, little work has been done on machine translation of recipe texts. In this paper, we address the task of translating recipes and investigate the advantages and disadvantages of traditional phrase-based statistical machine translation and more recent neural machine translation. Specifically, we translate Japanese recipes into English, analyze errors in the translated recipes, and discuss available room for improvements.

2012

pdf bib
Flexible Japanese Sentence Compression by Relaxing Unit Constraints
Jun Harashima | Sadao Kurohashi
Proceedings of COLING 2012

2011

pdf bib
Relevance Feedback using Latent Information
Jun Harashima | Sadao Kurohashi
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Summarizing Search Results using PLSI
Jun Harashima | Sadao Kurohashi
Proceedings of the Second Workshop on NLP Challenges in the Information Explosion Era (NLPIX 2010)

2008

pdf bib
SYNGRAPH: A Flexible Matching Method based on Synonymous Expression Extraction from an Ordinary Dictionary and a Web Corpus
Tomohide Shibata | Michitaka Odani | Jun Harashima | Takashi Oonishi | Sadao Kurohashi
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II