Meng Sun


2020

pdf bib
Semi-supervised Category-specific Review Tagging on Indonesian E-Commerce Product Reviews
Meng Sun | Marie Stephen Leo | Eram Munawwar | Paul C. Condylis | Sheng-yi Kong | Seong Per Lee | Albert Hidayat | Muhamad Danang Kerianto
Proceedings of the 3rd Workshop on e-Commerce and NLP

Product reviews are a huge source of natural language data in e-commerce applications. Several millions of customers write reviews regarding a variety of topics. We categorize these topics into two groups as either “category-specific” topics or as “generic” topics that span multiple product categories. While we can use a supervised learning approach to tag review text for generic topics, it is impossible to use supervised approaches to tag category-specific topics due to the sheer number of possible topics for each category. In this paper, we present an approach to tag each review with several product category-specific tags on Indonesian language product reviews using a semi-supervised approach. We show that our proposed method can work at scale on real product reviews at Tokopedia, a major e-commerce platform in Indonesia. Manual evaluation shows that the proposed method can efficiently generate category-specific product tags.

2019

pdf bib
Baidu Neural Machine Translation Systems for WMT19
Meng Sun | Bojian Jiang | Hao Xiong | Zhongjun He | Hua Wu | Haifeng Wang
Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

In this paper we introduce the systems Baidu submitted for the WMT19 shared task on Chinese<->English news translation. Our systems are based on the Transformer architecture with some effective improvements. Data selection, back translation, data augmentation, knowledge distillation, domain adaptation, model ensemble and re-ranking are employed and proven effective in our experiments. Our Chinese->English system achieved the highest case-sensitive BLEU score among all constrained submissions, and our English->Chinese system ranked the second in all submissions.

2018

pdf bib
Multi-Perspective Context Aggregation for Semi-supervised Cloze-style Reading Comprehension
Liang Wang | Sujian Li | Wei Zhao | Kewei Shen | Meng Sun | Ruoyu Jia | Jingming Liu
Proceedings of the 27th International Conference on Computational Linguistics

Cloze-style reading comprehension has been a popular task for measuring the progress of natural language understanding in recent years. In this paper, we design a novel multi-perspective framework, which can be seen as the joint training of heterogeneous experts and aggregate context information from different perspectives. Each perspective is modeled by a simple aggregation module. The outputs of multiple aggregation modules are fed into a one-timestep pointer network to get the final answer. At the same time, to tackle the problem of insufficient labeled data, we propose an efficient sampling mechanism to automatically generate more training examples by matching the distribution of candidates between labeled and unlabeled data. We conduct our experiments on a recently released cloze-test dataset CLOTH (Xie et al., 2017), which consists of nearly 100k questions designed by professional teachers. Results show that our method achieves new state-of-the-art performance over previous strong baselines.

pdf bib
Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational Knowledge for Commonsense Machine Comprehension
Liang Wang | Meng Sun | Wei Zhao | Kewei Shen | Jingming Liu
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes our system for SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge. We use Three-way Attentive Networks (TriAN) to model interactions between the passage, question and answers. To incorporate commonsense knowledge, we augment the input with relation embedding from the graph of general knowledge ConceptNet. As a result, our system achieves state-of-the-art performance with 83.95% accuracy on the official test data. Code is publicly available at https://github.com/intfloat/commonsense-rc.

2013

pdf bib
Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
Wenbin Jiang | Meng Sun | Yajuan Lü | Yating Yang | Qun Liu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Stem Translation with Affix-Based Rule Selection for Agglutinative Languages
Zhiyang Wang | Yajuan Lü | Meng Sun | Qun Liu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)