Jungyun Seo

Also published as: Jung Yun Seo


2021

pdf bib
Fine-grained Post-training for Improving Retrieval-based Dialogue Systems
Janghoon Han | Taesuk Hong | Byoungjae Kim | Youngjoong Ko | Jungyun Seo
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Retrieval-based dialogue systems display an outstanding performance when pre-trained language models are used, which includes bidirectional encoder representations from transformers (BERT). During the multi-turn response selection, BERT focuses on training the relationship between the context with multiple utterances and the response. However, this method of training is insufficient when considering the relations between each utterance in the context. This leads to a problem of not completely understanding the context flow that is required to select a response. To address this issue, we propose a new fine-grained post-training method that reflects the characteristics of the multi-turn dialogue. Specifically, the model learns the utterance level interactions by training every short context-response pair in a dialogue session. Furthermore, by using a new training objective, the utterance relevance classification, the model understands the semantic relevance and coherence between the dialogue utterances. Experimental results show that our model achieves new state-of-the-art with significant margins on three benchmark datasets. This suggests that the fine-grained post-training method is highly effective for the response selection task.

2020

pdf bib
NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection with Cross-lingual Transfer
Hwijeen Ahn | Jimin Sun | Chan Young Park | Jungyun Seo
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes our approach to the task of identifying offensive languages in a multilingual setting. We investigate two data augmentation strategies: using additional semi-supervised labels with different thresholds and cross-lingual transfer with data selection. Leveraging the semi-supervised dataset resulted in performance improvements compared to the baseline trained solely with the manually-annotated dataset. We propose a new metric, Translation Embedding Distance, to measure the transferability of instances for cross-lingual data selection. We also introduce various preprocessing steps tailored for social media text along with methods to fine-tune the pre-trained multilingual BERT (mBERT) for offensive language identification. Our multilingual systems achieved competitive results in Greek, Danish, and Turkish at OffensEval 2020.

pdf bib
Multi-Task Learning for Knowledge Graph Completion with Pre-trained Language Models
Bosung Kim | Taesuk Hong | Youngjoong Ko | Jungyun Seo
Proceedings of the 28th International Conference on Computational Linguistics

As research on utilizing human knowledge in natural language processing has attracted considerable attention in recent years, knowledge graph (KG) completion has come into the spotlight. Recently, a new knowledge graph completion method using a pre-trained language model, such as KG-BERT, is presented and showed high performance. However, its scores in ranking metrics such as Hits@k are still behind state-of-the-art models. We claim that there are two main reasons: 1) failure in sufficiently learning relational information in knowledge graphs, and 2) difficulty in picking out the correct answer from lexically similar candidates. In this paper, we propose an effective multi-task learning method to overcome the limitations of previous works. By combining relation prediction and relevance ranking tasks with our target link prediction, the proposed model can learn more relational properties in KGs and properly perform even when lexical similarity occurs. Experimental results show that we not only largely improve the ranking performances compared to KG-BERT but also achieve the state-of-the-art performances in Mean Rank and Hits@10 on the WN18RR dataset.

2019

pdf bib
ThisIsCompetition at SemEval-2019 Task 9: BERT is unstable for out-of-domain samples
Cheoneum Park | Juae Kim | Hyeon-gu Lee | Reinald Kim Amplayo | Harksoo Kim | Jungyun Seo | Changki Lee
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes our system, Joint Encoders for Stable Suggestion Inference (JESSI), for the SemEval 2019 Task 9: Suggestion Mining from Online Reviews and Forums. JESSI is a combination of two sentence encoders: (a) one using multiple pre-trained word embeddings learned from log-bilinear regression (GloVe) and translation (CoVe) models, and (b) one on top of word encodings from a pre-trained deep bidirectional transformer (BERT). We include a domain adversarial training module when training for out-of-domain samples. Our experiments show that while BERT performs exceptionally well for in-domain samples, several runs of the model show that it is unstable for out-of-domain samples. The problem is mitigated tremendously by (1) combining BERT with a non-BERT encoder, and (2) using an RNN-based classifier on top of BERT. Our final models obtained second place with 77.78% F-Score on Subtask A (i.e. in-domain) and achieved an F-Score of 79.59% on Subtask B (i.e. out-of-domain), even without using any additional external data.

2017

pdf bib
A Method to Generate a Machine-Labeled Data for Biomedical Named Entity Recognition with Various Sub-Domains
Juae Kim | Sunjae Kwon | Youngjoong Ko | Jungyun Seo
Proceedings of the International Workshop on Digital Disease Detection using Social Media 2017 (DDDSM-2017)

Biomedical Named Entity (NE) recognition is a core technique for various works in the biomedical domain. In previous studies, using machine learning algorithm shows better performance than dictionary-based and rule-based approaches because there are too many terminological variations of biomedical NEs and new biomedical NEs are constantly generated. To achieve the high performance with a machine-learning algorithm, good-quality corpora are required. However, it is difficult to obtain the good-quality corpora because an-notating a biomedical corpus for ma-chine-learning is extremely time-consuming and costly. In addition, most previous corpora are insufficient for high-level tasks because they cannot cover various domains. Therefore, we propose a method for generating a large amount of machine-labeled data that covers various domains. To generate a large amount of machine-labeled data, firstly we generate an initial machine-labeled data by using a chunker and MetaMap. The chunker is developed to extract only biomedical NEs with manually annotated data. MetaMap is used to annotate the category of bio-medical NE. Then we apply the self-training approach to bootstrap the performance of initial machine-labeled data. In our experiments, the biomedical NE recognition system that is trained with our proposed machine-labeled data achieves much high performance. As a result, our system outperforms biomedical NE recognition system that using MetaMap only with 26.03%p improvements on F1-score.

2016

pdf bib
KSAnswer: Question-answering System of Kangwon National University and Sogang University in the 2016 BioASQ Challenge
Hyeon-gu Lee | Minkyoung Kim | Harksoo Kim | Juae Kim | Sunjae Kwon | Jungyun Seo | Yi-reun Kim | Jung-Kyu Choi
Proceedings of the Fourth BioASQ workshop

2015

pdf bib
Improved Entity Linking with User History and News Articles
Soyun Jeong | Youngmin Park | Sangwoo Kang | Jungyun Seo
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

pdf bib
A Simultaneous Recognition Framework for the Spoken Language Understanding Module of Intelligent Personal Assistant Software on Smart Phones
Changsu Lee | Youngjoong Ko | Jungyun Seo
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2008

pdf bib
Speakers’ Intention Prediction Using Statistics of Multi-level Features in a Schedule Management Domain
Donghyun Kim | Hyunjung Lee | Choong-Nyoung Seon | Harksoo Kim | Jungyun Seo
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Information extraction using finite state automata and syllable n-grams in a mobile environment
Choong-Nyoung Seon | Harksoo Kim | Jungyun Seo
Proceedings of the ACL-08: HLT Workshop on Mobile Language Processing

2005

pdf bib
Improving Korean Speech Acts Analysis by Using Shrinkage and Discourse Stack
Kyungsun Kim | Youngjoong Ko | Jungyun Seo
Second International Joint Conference on Natural Language Processing: Full Papers

2004

pdf bib
Learning with Unlabeled Data for Text Categorization Using a Bootstrapping and a Feature Projection Technique
Youngjoong Ko | Jungyun Seo
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2002

pdf bib
The Grammatical Function Analysis between Korean Adnoun Clause and Noun Phrase by Using Support Vector Machines
Songwook Lee | Tae-Yeoub Jang | Jungyun Seo
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Text Categorization using Feature Projections
Youngjoong Ko | Jungyun Seo
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Automatic Text Categorization using the Importance of Sentences
Youngjoong Ko | Jinwoo Park | Jungyun Seo
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
A Reliable Indexing Method for a Practical QA System
Harksoo Kim | Jungyun Seo
COLING-02: Multilingual Summarization and Question Answering

2001

pdf bib
MAYA: A Fast Question-answering System Based on a Predictive Answer Indexer
Harksoo Kim | Kyungsun Kim | Gary Geunbae Lee | Jungyun Seo
Proceedings of the ACL 2001 Workshop on Open-Domain Question Answering

2000

pdf bib
Corpus-Based Learning of Compound Noun Indexing
Byung-Kwan Kwak | Jee-Hyub Kim | Geunbae Lee | Jung Yun Seo
ACL-2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval

pdf bib
Automatic Text Categorization by Unsupervised Learning
Youngjoong Ko | Jungyun Seo
COLING 2000 Volume 1: The 18th International Conference on Computational Linguistics

1999

pdf bib
Anaphora Resolution using Extended Centen’ng Algorithm in a Multi-modal Dialogue System
Harksoo Kim | Jeong-Mi Cho | Jungyun Seo
The Relation of Discourse/Dialogue Structure and Reference

pdf bib
Dual Distributional Verb Sense Disambiguation with Small Corpora and Machine Readable Dictionaries
Jeong-Mi Cho | Jungyun Seo | Gil Chang Kim
Unsupervised Learning in Natural Language Processing

pdf bib
Analysis System of Speech Acts and Discourse Structures Using Maximum Entropy Model
Won Seug Choi | Jeong-Mi Cho | Jungyun Seo
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1995

pdf bib
A Robust Parser Based on Syntactic Information
Kong Joo Lee | Cheol Jung Kweon | Jungyun Seo | Gil Chang Kim
Seventh Conference of the European Chapter of the Association for Computational Linguistics

1990

pdf bib
Transforming Syntactic Graphs Into Semantic Graphs
Hae-Chang Rim | Robert F. Simmons | Jungyun Seo
28th Annual Meeting of the Association for Computational Linguistics

1989

pdf bib
Syntactic Graphs: A Representation for the Union of All Ambiguous Parse Trees
Jungyun Seo | Robert F. Simmons
Computational Linguistics, Volume 15, Number 1, March 1989