Xin Wu


2023

pdf bib
Segment-Level and Category-Oriented Network for Knowledge-Based Referring Expression Comprehension
Yuqi Bu | Xin Wu | Liuwu Li | Yi Cai | Qiong Liu | Qingbao Huang
Findings of the Association for Computational Linguistics: ACL 2023

Knowledge-based referring expression comprehension (KB-REC) aims to identify visual objects referred to by expressions that incorporate knowledge. Existing methods employ sentence-level retrieval and fusion methods, which may lead to issues of similarity bias and interference from irrelevant information in unstructured knowledge sentences. To address these limitations, we propose a segment-level and category-oriented network (SLCO). Our approach includes a segment-level and prompt-based knowledge retrieval method to mitigate the similarity bias problem and a category-based grounding method to alleviate interference from irrelevant information in knowledge sentences. Experimental results show that our SLCO can eliminate interference and improve the overall performance of the KB-REC task.

pdf bib
CLEVR-Implicit: A Diagnostic Dataset for Implicit Reasoning in Referring Expression Comprehension
Jingwei Zhang | Xin Wu | Yi Cai
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Recently, pre-trained vision-language (VL) models have achieved remarkable success in various cross-modal tasks, including referring expression comprehension (REC). These models are pre-trained on the large-scale image-text pairs to learn the alignment between words in textual descriptions and objects in the corresponding images and then fine-tuned on downstream tasks. However, the performance of VL models is hindered when dealing with implicit text, which describes objects through comparisons between two or more objects rather than explicitly mentioning them. This is because the models struggle to align the implicit text with the objects in the images. To address the challenge, we introduce CLEVR-Implicit, a dataset consisting of synthetic images and corresponding two types of implicit text for the REC task. Additionally, to enhance the performance of VL models on implicit text, we propose a method called Transforming Implicit text into Explicit text (TIE), which enables VL models to reason with the implicit text. TIE consists of two modules: (1) the prompt design module builds prompts for implicit text by adding masked tokens, and (2) the cloze procedure module fine-tunes the prompts by utilizing masked language modeling (MLM) to predict the explicit words with the implicit prompts. Experimental results on our dataset demonstrate a significant improvement of 37.94% in the performance of VL models on implicit text after employing our TIE method.

2020

pdf bib
Task-oriented Domain-specific Meta-Embedding for Text Classification
Xin Wu | Yi Cai | Yang Kai | Tao Wang | Qing Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Meta-embedding learning, which combines complementary information in different word embeddings, have shown superior performances across different Natural Language Processing tasks. However, domain-specific knowledge is still ignored by existing meta-embedding methods, which results in unstable performances across specific domains. Moreover, the importance of general and domain word embeddings is related to downstream tasks, how to regularize meta-embedding to adapt downstream tasks is an unsolved problem. In this paper, we propose a method to incorporate both domain-specific and task-oriented information into meta-embeddings. We conducted extensive experiments on four text classification datasets and the results show the effectiveness of our proposed method.

pdf bib
TSDG: Content-aware Neural Response Generation with Two-stage Decoding Process
Junsheng Kong | Zhicheng Zhong | Yi Cai | Xin Wu | Da Ren
Findings of the Association for Computational Linguistics: EMNLP 2020

Neural response generative models have achieved remarkable progress in recent years but tend to yield irrelevant and uninformative responses. One of the reasons is that encoder-decoder based models always use a single decoder to generate a complete response at a stroke. This tends to generate high-frequency function words with less semantic information rather than low-frequency content words with more semantic information. To address this issue, we propose a content-aware model with two-stage decoding process named Two-stage Dialogue Generation (TSDG). We separate the decoding process of content words and function words so that content words can be generated independently without the interference of function words. Experimental results on two datasets indicate that our model significantly outperforms several competitive generative models in terms of automatic and human evaluation.

2010

pdf bib
HCAMiner: Mining Concept Associations for Knowledge Discovery through Concept Chain Queries
Wei Jin | Xin Wu
Coling 2010: Demonstrations