Bo An


2023

pdf bib
Tibetan Dependency Parsing with Graph Convolutional Neural Networks
Bo An
Proceedings of the Ancient Language Processing Workshop

Dependency parsing is a syntactic analysis method to analyze the dependency relationships between words in a sentence. The interconnection between words through dependency relationships is typical graph data. Traditional Tibetan dependency parsing methods typically model dependency analysis as a transition-based or sequence-labeling task, ignoring the graph information between words. To address this issue, this paper proposes a graph neural network (GNN)-based Tibetan dependency parsing method. This method treats Tibetan words as nodes and the dependency relationships between words as edges, thereby constructing the graph data of Tibetan sentences. Specifically, we use BiLSTM to learn the word representations of Tibetan, utilize GNN to model the relationships between words and employ MLP to predict the types of relationships between words. We conduct experiments on a Tibetan dependency database, and the results show that the proposed method can achieve high-quality Tibetan dependency parsing results.

pdf bib
On the Development of Interlinearized Ancient Literature of Ethnic Minorities: A Case Study of the Interlinearization of Ancient Written Tibetan Literature
Congjun Long | Bo An
Proceedings of the Ancient Language Processing Workshop

Ancient ethnic documents are essential to China’s ancient literature and an indispensable civilizational achievement of Chinese culture. However, few research teams are involved due to language and script literacy limitations. To address these issues, this paper proposes an interlinearized annotation strategy for ancient ethnic literature. This strategy aims to alleviate text literacy difficulties, encourage interdisciplinary researchers to participate in studying ancient ethnic literature, and improve the efficiency of ancient ethnic literature development. Concretely, the interlinearized annotation consists of original, word segmentation, Latin, annotated, and translation lines. In this paper, we take ancient Tibetan literature as an example to explore the interlinearized annotation strategy. However, manually building large-scale corpus is challenging. To build a large-scale interlinearized dataset, we propose a multi-task learning-based interlinearized annotation method, which can generate interlinearized annotation lines based on the original line. Experimental results show that after training on about 10,000 sentences (lines) of data, our model achieves 70.9% and 63.2% F1 values on the segmentation lines and annotated lines, respectively, and 18.7% BLEU on the translation lines. It dramatically enhances the efficiency of data annotation, effectively speeds up interlinearized annotation, and reduces the workload of manual annotation.

2020

pdf bib
Enhancing Neural Models with Vulnerability via Adversarial Attack
Rong Zhang | Qifei Zhou | Bo An | Weiping Li | Tong Mo | Bo Wu
Proceedings of the 28th International Conference on Computational Linguistics

Natural Language Sentence Matching (NLSM) serves as the core of many natural language processing tasks. 1) Most previous work develops a single specific neural model for NLSM tasks. 2) There is no previous work considering adversarial attack to improve the performance of NLSM tasks. 3) Adversarial attack is usually used to generate adversarial samples that can fool neural models. In this paper, we first find a phenomenon that different categories of samples have different vulnerabilities. Vulnerability is the difficulty degree in changing the label of a sample. Considering the phenomenon, we propose a general two-stage training framework to enhance neural models with Vulnerability via Adversarial Attack (VAA). We design criteria to measure the vulnerability which is obtained by adversarial attack. VAA framework can be adapted to various neural models by incorporating the vulnerability. In addition, we prove a theorem and four corollaries to explain the factors influencing vulnerability effectiveness. Experimental results show that VAA significantly improves the performance of neural models on NLSM datasets. The results are also consistent with the theorem and corollaries. The code is released on https://github.com/rzhangpku/VAA.

2019

pdf bib
EUSP: An Easy-to-Use Semantic Parsing PlatForm
Bo An | Chen Bo | Xianpei Han | Le Sun
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations

Semantic parsing aims to map natural language utterances into structured meaning representations. We present a modular platform, EUSP (Easy-to-Use Semantic Parsing PlatForm), that facilitates developers to build semantic parser from scratch. Instead of requiring a large amount of training data or complex grammar knowledge, in our platform developers can build grammar-based semantic parser or neural-based semantic parser through configure files which specify the modules and components that compose semantic parsing system. A high quality grammar-based semantic parsing system only requires domain lexicons rather than costly training data for a semantic parser. Furthermore, we provide a browser-based method to generate the semantic parsing system to minimize the difficulty of development. Experimental results show that the neural-based semantic parser system achieves competitive performance on semantic parsing task, and grammar-based semantic parsers significantly improve the performance of a business search engine.

2018

pdf bib
Semi-Supervised Lexicon Learning for Wide-Coverage Semantic Parsing
Bo Chen | Bo An | Le Sun | Xianpei Han
Proceedings of the 27th International Conference on Computational Linguistics

Semantic parsers critically rely on accurate and high-coverage lexicons. However, traditional semantic parsers usually utilize annotated logical forms to learn the lexicon, which often suffer from the lexicon coverage problem. In this paper, we propose a graph-based semi-supervised learning framework that makes use of large text corpora and lexical resources. This framework first constructs a graph with a phrase similarity model learned by utilizing many text corpora and lexical resources. Next, graph propagation algorithm identifies the label distribution of unlabeled phrases from labeled ones. We evaluate our approach on two benchmarks: Webquestions and Free917. The results show that, in both datasets, our method achieves substantial improvement when comparing to the base system that does not utilize the learned lexicon, and gains competitive results when comparing to state-of-the-art systems.

pdf bib
Model-Free Context-Aware Word Composition
Bo An | Xianpei Han | Le Sun
Proceedings of the 27th International Conference on Computational Linguistics

Word composition is a promising technique for representation learning of large linguistic units (e.g., phrases, sentences and documents). However, most of the current composition models do not take the ambiguity of words and the context outside of a linguistic unit into consideration for learning representations, and consequently suffer from the inaccurate representation of semantics. To address this issue, we propose a model-free context-aware word composition model, which employs the latent semantic information as global context for learning representations. The proposed model attempts to resolve the word sense disambiguation and word composition in a unified framework. Extensive evaluation shows consistent improvements over various strong word representation/composition models at different granularities (including word, phrase and sentence), demonstrating the effectiveness of our proposed method.

pdf bib
Accurate Text-Enhanced Knowledge Graph Representation Learning
Bo An | Bo Chen | Xianpei Han | Le Sun
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

Previous representation learning techniques for knowledge graph representation usually represent the same entity or relation in different triples with the same representation, without considering the ambiguity of relations and entities. To appropriately handle the semantic variety of entities/relations in distinct triples, we propose an accurate text-enhanced knowledge graph representation learning method, which can represent a relation/entity with different representations in different triples by exploiting additional textual information. Specifically, our method enhances representations by exploiting the entity descriptions and triple-specific relation mention. And a mutual attention mechanism between relation mention and entity description is proposed to learn more accurate textual representations for further improving knowledge graph representation. Experimental results show that our method achieves the state-of-the-art performance on both link prediction and triple classification tasks, and significantly outperforms previous text-enhanced knowledge representation models.

2016

pdf bib
ISCAS_NLP at SemEval-2016 Task 1: Sentence Similarity Based on Support Vector Regression using Multiple Features
Cheng Fu | Bo An | Xianpei Han | Le Sun
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf bib
Sentence Rewriting for Semantic Parsing
Bo Chen | Le Sun | Xianpei Han | Bo An
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)