Hai Leong Chieu


2023

pdf bib
Multi-label and Multi-target Sampling of Machine Annotation for Computational Stance Detection
Zhengyuan Liu | Hai Leong Chieu | Nancy Chen
Findings of the Association for Computational Linguistics: EMNLP 2023

Data collection from manual labeling provides domain-specific and task-aligned supervision for data-driven approaches, and a critical mass of well-annotated resources is required to achieve reasonable performance in natural language processing tasks. However, manual annotations are often challenging to scale up in terms of time and budget, especially when domain knowledge, capturing subtle semantic features, and reasoning steps are needed. In this paper, we investigate the efficacy of leveraging large language models on automated labeling for computational stance detection. We empirically observe that while large language models show strong potential as an alternative to human annotators, their sensitivity to task-specific instructions and their intrinsic biases pose intriguing yet unique challenges in machine annotation. We introduce a multi-label and multi-target sampling strategy to optimize the annotation quality. Experimental results on the benchmark stance detection corpora show that our method can significantly improve performance and learning efficacy.

pdf bib
Guiding Computational Stance Detection with Expanded Stance Triangle Framework
Zhengyuan Liu | Yong Keong Yap | Hai Leong Chieu | Nancy Chen
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Stance detection determines whether the author of a piece of text is in favor of, against, or neutral towards a specified target, and can be used to gain valuable insights into social media. The ubiquitous indirect referral of targets makes this task challenging, as it requires computational solutions to model semantic features and infer the corresponding implications from a literal statement. Moreover, the limited amount of available training data leads to subpar performance in out-of-domain and cross-target scenarios, as data-driven approaches are prone to rely on superficial and domain-specific features. In this work, we decompose the stance detection task from a linguistic perspective, and investigate key components and inference paths in this task. The stance triangle is a generic linguistic framework previously proposed to describe the fundamental ways people express their stance. We further expand it by characterizing the relationship between explicit and implicit objects. We then use the framework to extend one single training corpus with additional annotation. Experimental results show that strategically-enriched data can significantly improve the performance on out-of-domain and cross-target evaluation.

2021

pdf bib
Cross-Topic Rumor Detection using Topic-Mixtures
Xiaoying Ren | Jing Jiang | Ling Min Serena Khoo | Hai Leong Chieu
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

There has been much interest in rumor detection using deep learning models in recent years. A well-known limitation of deep learning models is that they tend to learn superficial patterns, which restricts their generalization ability. We find that this is also true for cross-topic rumor detection. In this paper, we propose a method inspired by the “mixture of experts” paradigm. We assume that the prediction of the rumor class label given an instance is dependent on the topic distribution of the instance. After deriving a vector representation for each topic, given an instance, we derive a “topic mixture” vector for the instance based on its topic distribution. This topic mixture is combined with the vector representation of the instance itself to make rumor predictions. Our experiments show that our proposed method can outperform two baseline debiasing methods in a cross-topic setting. In a synthetic setting when we removed topic-specific words, our method also works better than the baselines, showing that our method does not rely on superficial features.

pdf bib
Intrinsic evaluation of language models for code-switching
Sik Feng Cheong | Hai Leong Chieu | Jing Lim
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Language models used in speech recognition are often either evaluated intrinsically using perplexity on test data, or extrinsically with an automatic speech recognition (ASR) system. The former evaluation does not always correlate well with ASR performance, while the latter could be specific to particular ASR systems. Recent work proposed to evaluate language models by using them to classify ground truth sentences among alternative phonetically similar sentences generated by a fine state transducer. Underlying such an evaluation is the assumption that the generated sentences are linguistically incorrect. In this paper, we first put this assumption into question, and observe that alternatively generated sentences could often be linguistically correct when they differ from the ground truth by only one edit. Secondly, we showed that by using multi-lingual BERT, we can achieve better performance than previous work on two code-switching data sets. Our implementation is publicly available on Github at https://github.com/sikfeng/language-modelling-for-code-switching.

pdf bib
Co-training for Commit Classification
Jian Yi David Lee | Hai Leong Chieu
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)

Commits in version control systems (e.g. Git) track changes in a software project. Commits comprise noisy user-generated natural language and code patches. Automatic commit classification (CC) has been used to determine the type of code maintenance activities performed, as well as to detect bug fixes in code repositories. Much prior work occurs in the fully-supervised setting – a setting that can be a stretch in resource-scarce situations presenting difficulties in labeling commits. In this paper, we apply co-training, a semi-supervised learning method, to take advantage of the two views available – the commit message (natural language) and the code changes (programming language) – to improve commit classification.

2020

pdf bib
Coupled Hierarchical Transformer for Stance-Aware Rumor Verification in Social Media Conversations
Jianfei Yu | Jing Jiang | Ling Min Serena Khoo | Hai Leong Chieu | Rui Xia
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

The prevalent use of social media enables rapid spread of rumors on a massive scale, which leads to the emerging need of automatic rumor verification (RV). A number of previous studies focus on leveraging stance classification to enhance RV with multi-task learning (MTL) methods. However, most of these methods failed to employ pre-trained contextualized embeddings such as BERT, and did not exploit inter-task dependencies by using predicted stance labels to improve the RV task. Therefore, in this paper, to extend BERT to obtain thread representations, we first propose a Hierarchical Transformer, which divides each long thread into shorter subthreads, and employs BERT to separately represent each subthread, followed by a global Transformer layer to encode all the subthreads. We further propose a Coupled Transformer Module to capture the inter-task interactions and a Post-Level Attention layer to use the predicted stance labels for RV, respectively. Experiments on two benchmark datasets show the superiority of our Coupled Hierarchical Transformer model over existing MTL approaches.

2019

pdf bib
Twitter Homophily: Network Based Prediction of User’s Occupation
Jiaqi Pan | Rishabh Bhardwaj | Wei Lu | Hai Leong Chieu | Xinghao Pan | Ni Yi Puay
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

In this paper, we investigate the importance of social network information compared to content information in the prediction of a Twitter user’s occupational class. We show that the content information of a user’s tweets, the profile descriptions of a user’s follower/following community, and the user’s social network provide useful information for classifying a user’s occupational group. In our study, we extend an existing data set for this problem, and we achieve significantly better performance by using social network homophily that has not been fully exploited in previous work. In our analysis, we found that by using the graph convolutional network to exploit social homophily, we can achieve competitive performance on this data set with just a small fraction of the training data.

2017

pdf bib
Can Syntax Help? Improving an LSTM-based Sentence Compression Model for New Domains
Liangguo Wang | Jing Jiang | Hai Leong Chieu | Chen Hui Ong | Dandan Song | Lejian Liao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we study how to improve the domain adaptability of a deletion-based Long Short-Term Memory (LSTM) neural network model for sentence compression. We hypothesize that syntactic information helps in making such models more robust across domains. We propose two major changes to the model: using explicit syntactic features and introducing syntactic constraints through Integer Linear Programming (ILP). Our evaluation shows that the proposed model works better than the original model as well as a traditional non-neural-network-based model in a cross-domain setting.

pdf bib
Universal Dependencies Parsing for Colloquial Singaporean English
Hongmin Wang | Yue Zhang | GuangYong Leonard Chan | Jie Yang | Hai Leong Chieu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Singlish can be interesting to the ACL community both linguistically as a major creole based on English, and computationally for information extraction and sentiment analysis of regional social media. We investigate dependency parsing of Singlish by constructing a dependency treebank under the Universal Dependencies scheme, and then training a neural network model by integrating English syntactic knowledge into a state-of-the-art parser trained on the Singlish treebank. Results show that English knowledge can lead to 25% relative error reduction, resulting in a parser of 84.47% accuracies. To the best of our knowledge, we are the first to use neural stacking to improve cross-lingual dependency parsing on low-resource languages. We make both our annotation and parser available for further research.

2016

pdf bib
A General Regularization Framework for Domain Adaptation
Wei Lu | Hai Leong Chieu | Jonathan Löfgren
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf bib
Learning to Capitalize with Character-Level Recurrent Neural Networks: An Empirical Study
Raymond Hendy Susanto | Hai Leong Chieu | Wei Lu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
Robust Domain Adaptation for Relation Extraction via Clustering Consistency
Minh Luan Nguyen | Ivor W. Tsang | Kian Ming A. Chai | Hai Leong Chieu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf bib
Domain Adaptation for Coreference Resolution: An Adaptive Ensemble Approach
Jian Bo Yang | Qi Mao | Qiao Liang Xiang | Ivor Wai-Hung Tsang | Kian Ming Adam Chai | Hai Leong Chieu
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

2011

pdf bib
Extracting Relation Descriptors with Conditional Random Fields
Yaliang Li | Jing Jiang | Hai Leong Chieu | Kian Ming A. Chai
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Unsupervised Information Extraction with Distributional Prior Knowledge
Cane Wing-ki Leung | Jing Jiang | Kian Ming A. Chai | Hai Leong Chieu | Loo-Nin Teow
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2009

pdf bib
Domain adaptive bootstrapping for named entity recognition
Dan Wu | Wee Sun Lee | Nan Ye | Hai Leong Chieu
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2003

pdf bib
Closing the Gap: Learning-Based Information Extraction Rivaling Knowledge-Engineering Methods
Hai Leong Chieu | Hwee Tou Ng | Yoong Keok Lee
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Named Entity Recognition with a Maximum Entropy Approach
Hai Leong Chieu | Hwee Tou Ng
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

2002

pdf bib
Teaching a Weaker Classifier: Named Entity Recognition on Upper Case Text
Hai Leong Chieu | Hwee Tou Ng
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib
Named Entity Recognition: A Maximum Entropy Approach Using Global Information
Hai Leong Chieu | Hwee Tou Ng
COLING 2002: The 19th International Conference on Computational Linguistics