Wei Jiang


2023

pdf bib
AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing
Asaad Alghamdi | Xinyu Duan | Wei Jiang | Zhenhai Wang | Yimeng Wu | Qingrong Xia | Zhefeng Wang | Yi Zheng | Mehdi Rezagholizadeh | Baoxing Huai | Peilun Cheng | Abbas Ghaddar
Findings of the Association for Computational Linguistics: ACL 2023

Developing monolingual large Pre-trained Language Models (PLMs) is shown to be very successful in handling different tasks in Natural Language Processing (NLP). In this work, we present AraMUS, the largest Arabic PLM with 11B parameters trained on 529GB of high-quality Arabic textual data. AraMUS achieves state-of-the-art performances on a diverse set of Arabic classification and generative tasks. Moreover, AraMUS shows impressive few-shot learning abilities compared with the best existing Arabic PLMs.

2019

pdf bib
HLT@SUDA at SemEval-2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing
Wei Jiang | Zhenghua Li | Yu Zhang | Min Zhang
Proceedings of the 13th International Workshop on Semantic Evaluation

This paper describes a simple UCCA semantic graph parsing approach. The key idea is to convert a UCCA semantic graph into a constituent tree, in which extra labels are deliberately designed to mark remote edges and discontinuous nodes for future recovery. In this way, we can make use of existing syntactic parsing techniques. Based on the data statistics, we recover discontinuous nodes directly according to the output labels of the constituent parser and use a biaffine classification model to recover the more complex remote edges. The classification model and the constituent parser are simultaneously trained under the multi-task learning framework. We use the multilingual BERT as extra features in the open tracks. Our system ranks the first place in the six English/German closed/open tracks among seven participating systems. For the seventh cross-lingual track, where there is little training data for French, we propose a language embedding approach to utilize English and German training data, and our result ranks the second place.

pdf bib
SUDA-Alibaba at MRP 2019: Graph-Based Models with BERT
Yue Zhang | Wei Jiang | Qingrong Xia | Junjie Cao | Rui Wang | Zhenghua Li | Min Zhang
Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the 2019 Conference on Natural Language Learning

In this paper, we describe our participating systems in the shared task on Cross- Framework Meaning Representation Parsing (MRP) at the 2019 Conference for Computational Language Learning (CoNLL). The task includes five frameworks for graph-based meaning representations, i.e., DM, PSD, EDS, UCCA, and AMR. One common characteristic of our systems is that we employ graph-based methods instead of transition-based methods when predicting edges between nodes. For SDP, we jointly perform edge prediction, frame tagging, and POS tagging via multi-task learning (MTL). For UCCA, we also jointly model a constituent tree parsing and a remote edge recovery task. For both EDS and AMR, we produce nodes first and edges second in a pipeline fashion. External resources like BERT are found helpful for all frameworks except AMR. Our final submission ranks the third on the overall MRP evaluation metric, the first on EDS and the second on UCCA.

2006

pdf bib
A Pragmatic Chinese Word Segmentation Approach Based on Mixing Models
Wei Jiang | Yi Guan | Xiao-Long Wang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 11, Number 4, December 2006

pdf bib
A Pragmatic Chinese Word Segmentation System
Wei Jiang | Yi Guan | Xiao-Long Wang
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

2005

pdf bib
Chinese Word Segmentation based on Mixing Model
Wei Jiang | Jian Zhao | Yi Guan | Zhiming Xu
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing