Ge Xu


2023

pdf bib
Domain Adaptation for Conversational Query Production with the RAG Model Feedback
Ante Wang | Linfeng Song | Ge Xu | Jinsong Su
Findings of the Association for Computational Linguistics: EMNLP 2023

Conversational query production is an emerging fundamental task for the dialogue system, where search queries are generated to explore the vast and continually updating knowledge from a search engine. To accelerate this line of research, previous studies have released several datasets with human-annotated search queries. However, the limited annotations still can not cover conversations of various domains. To solve this challenge, we propose a novel domain adaptation framework. It is inspired by a weakly supervised learning algorithm from previous work that guides a model using reinforcement learning with BM25 scores as feedback. Though effective, it is fragile facing noisy content on webpages from a commercial search engine and variance in conversations because of ignoring deep semantic information of dialogue contexts. Thus, we improve the algorithm by taking the advance of retrieval-augmented generation (RAG) and exploring several practical techniques such as knowledge distillation for stable training. We conduct experiments in multiple settings across different languages. Guided by the RAG model feedback, our model is more robust and performs significantly better especially in a more challenging setting over strong baselines.

2016

pdf bib
Selective Annotation of Sentence Parts: Identification of Relevant Sub-sentential Units
Ge Xu | Xiaoyan Yang | Chu-Ren Huang
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

Many NLP tasks involve sentence-level annotation yet the relevant information is not encoded at sentence level but at some relevant parts of the sentence. Such tasks include but are not limited to: sentiment expression annotation, product feature annotation, and template annotation for Q&A systems. However, annotation of the full corpus sentence by sentence is resource intensive. In this paper, we propose an approach that iteratively extracts frequent parts of sentences for annotating, and compresses the set of sentences after each round of annotation. Our approach can also be used in preparing training sentences for binary classification (domain-related vs. noise, subjectivity vs. objectivity, etc.), assuming that sentence-type annotation can be predicted by annotation of the most relevant sub-sentences. Two experiments are performed to test our proposal and evaluated in terms of time saved and agreement of annotation.

2014

pdf bib
An Analysis of Radicals-based Features in Subjectivity Classification on Simplified Chinese Sentences
Ge Xu | Churen Huang
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

2012

pdf bib
Lost in Translations? Building Sentiment Lexicons using Context Based Machine Translation
Xinfan Meng | Furu Wei | Ge Xu | Longkai Zhang | Xiaohua Liu | Ming Zhou | Houfeng Wang
Proceedings of COLING 2012: Posters

pdf bib
Cross-Lingual Mixture Model for Sentiment Classification
Xinfan Meng | Furu Wei | Xiaohua Liu | Ming Zhou | Ge Xu | Houfeng Wang
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2010

pdf bib
Build Chinese Emotion Lexicons Using A Graph-based Algorithm and Multiple Resources
Ge Xu | Xinfan Meng | Houfeng Wang
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)