Shiwen Yu


2010

pdf bib
Automatic Acquisition of Chinese Novel Noun Compounds
Meng Wang | Chu-Ren Huang | Shiwen Yu | Weiwei Sun
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Automatic acquisition of novel compounds is notoriously difficult because most novel compounds have relatively low frequency in a corpus. The current study proposes a new method to deal with the novel compound acquisition challenge. We model this task as a two-class classification problem in which a candidate compound is either classified as a compound or a non-compound. A machine learning method using SVM, incorporating two types of linguistically motivated features: semantic features and character features, is applied to identify rare but valid noun compounds. We explore two kinds of training data: one is virtual training data which is obtained by three statistical scores, i.e. co-occurrence frequency, mutual information and dependent ratio, from the frequent compounds; the other is real training data which is randomly selected from the infrequent compounds. We conduct comparative experiments, and the experimental results show that even with limited direct evidence in the corpus for the novel compounds, we can make full use of the typical frequent compounds to help in the discovery of the novel compounds.

pdf bib
Construction of Chinese Idiom Knowledge-base and Its Applications
Lei Wang | Shiwen Yu
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

pdf bib
Semantic Computing and Language Knowledge Bases
Lei Wang | Shiwen Yu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Studies on Automatic Recognition of Common Chinese Adverb’s usages Based on Statistics Methods
Hongying Zan | Junhui Zhang | Xuefeng Zhu | Shiwen Yu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Chinese Word Sense Induction with Basic Clustering Algorithms
Yuxiang Jia | Shiwen Yu | Zhengyan Chen
CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf bib
Semi-Supervised WSD in Selectional Preferences with Semantic Redundancy
Xuri Tang | Xiaohe Chen | Weiguang Qu | Shiwen Yu
Coling 2010: Posters

2009

pdf bib
A Noisy Channel Model for Grapheme-based Machine Transliteration
Yuxiang Jia | Danqing Zhu | Shiwen Yu
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

2008

pdf bib
Unsupervised Chinese Verb Metaphor Recognition Based on Selectional Preferences
Yuxiang Jia | Shiwen Yu
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation

pdf bib
Quality Assurance of Automatic Annotation of Very Large Corpora: a Study based on heterogeneous Tagging System
Chu-Ren Huang | Lung-Hao Lee | Wei-guang Qu | Jia-Fei Hong | Shiwen Yu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We propose a set of heuristics for improving annotation quality of very large corpora efficiently. The Xinhua News portion of the Chinese Gigaword Corpus was tagged independently with both the Peking University ICL tagset and the Academia Sinica CKIP tagset. The corpus-based POS tags mapping will serve as the basis of the possible contrast in grammatical systems between PRC and Taiwan. And it can serve as the basic model for mapping between the CKIP and ICL tagging systems for any data.

2007

pdf bib
Building Chinese Sense Annotated Corpus with the Help of Software Tools
Yunfang Wu | Peng Jin | Tao Guo | Shiwen Yu
Proceedings of the Linguistic Annotation Workshop

pdf bib
SemEval-2007 Task 05: Multilingual Chinese-English Lexical Sample
Peng Jin | Yunfang Wu | Shiwen Yu
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2005

pdf bib
現代漢語中的形式動詞 (Dummy Verbs in Contemporary Chinese) [In Chinese]
Shiwen Yu | Xuefeng Zhu | Huiming Duan
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 4, December 2005: Special Issue on Selected Papers from CLSW-5

pdf bib
双向考察和驗證:并列成分中心語的語義關係和CCD的名詞語義分類体系 (Bidirectional Investigation: The Semantic Relations between the Conjuncts and the Noun Taxonomy in CCD) [In Chinese]
Yunfang Wu | Sujian Li | Yun Li | Shiwen Yu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 4, December 2005: Special Issue on Selected Papers from CLSW-5

pdf bib
基于現代漢語語法信息詞典的詞語情感評價研究 (Research on Lexical Emotional Evaluation Based on the Grammatical Knowledge-Base of Contemporary Chinese) [In Chinese]
Zhimin Wang | Xuefeng Zhu | Shiwen Yu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 4, December 2005: Special Issue on Selected Papers from CLSW-5

2004

pdf bib
Distributional Consistency: As a General Method for Defining a Core Lexicon
Huarui Zhang | Churen Huang | Shiwen Yu
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
A Large-scale Lexical Semantic Knowledge-base of Chinese
Hui Wang | Shiwen Yu
Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation

pdf bib
News-Oriented Keyword Indexing with Maximum Entropy Principle
Sujian Li | Houfeng Wang | Shiwen Yu | Chengsheng Xin
Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation

pdf bib
News-Oriented Automatic Chinese Keyword Indexing
Sujian Li | Houfeng Wang | Shiwen Yu | Chengsheng Xin
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

pdf bib
The semantic Knowledge-base of Contemporary Chinese and Its Applications in WSD
Hui Wang | Shiwen Yu
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

pdf bib
Chinese Word Segmentation at Peking University
Huiming Duan | Xiaojing Bai | Baobao Chang | Shiwen Yu
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

2002

pdf bib
Building a Bilingual WordNet-Like Lexicon: The New Approach and Algorithms
Yang Liu | Shiwen Yu | Jiangsheng Yu
COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes

2000

pdf bib
The Multi-layer Language Knowledge Base of Chinese NLP
Junfeng Hu | Shiwen Yu
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

pdf bib
TransEasy: A Chinese-English machine translation system based on hybrid approach
Qun Liu | Shiwen Yu
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: System Descriptions

This paper describes the progress of a machine translation system from Chinese to English. The system is based on a reusable platform of MT software components. It’s a rule-based system, and some statistical algorithms are used as heuristic functions in parsing as well. There are about 50,000 Chinese words and 400 global parsing rules in the system. The system got a good result in a public test of MT system in China in Mar. 1998. It is a research vehicle up to now.

1994

pdf bib
Blending Segmentation With Tagging in Chinese Language Corpus Processing
Qiang Zhou | Shiwen Yu
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics