Isao Echizen


2023

pdf bib
VoteTRANS: Detecting Adversarial Text without Training by Voting on Hard Labels of Transformations
Hoang-Quoc Nguyen-Son | Seira Hidano | Kazuhide Fukushima | Shinsaku Kiyomoto | Isao Echizen
Findings of the Association for Computational Linguistics: ACL 2023

Adversarial attacks reveal serious flaws in deep learning models. More dangerously, these attacks preserve the original meaning and escape human recognition. Existing methods for detecting these attacks need to be trained using original/adversarial data. In this paper, we propose detection without training by voting on hard labels from predictions of transformations, namely, VoteTRANS. Specifically, VoteTRANS detects adversarial text by comparing the hard labels of input text and its transformation. The evaluation demonstrates that VoteTRANS effectively detects adversarial text across various state-of-the-art attacks, models, and datasets.

2022

pdf bib
A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification
Sosuke Nishikawa | Ikuya Yamada | Yoshimasa Tsuruoka | Isao Echizen
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)

We present a multilingual bag-of-entities model that effectively boosts the performance of zero-shot cross-lingual text classification by extending a multilingual pre-trained language model (e.g., M-BERT). It leverages the multilingual nature of Wikidata: entities in multiple languages representing the same concept are defined with a unique identifier. This enables entities described in multiple languages to be represented using shared embeddings. A model trained on entity features in a resource-rich language can thus be directly applied to other languages. Our experimental results on cross-lingual topic classification (using the MLDoc and TED-CLDC datasets) and entity typing (using the SHINRA2020-ML dataset) show that the proposed model consistently outperforms state-of-the-art models.

pdf bib
EASE: Entity-Aware Contrastive Learning of Sentence Embedding
Sosuke Nishikawa | Ryokan Ri | Ikuya Yamada | Yoshimasa Tsuruoka | Isao Echizen
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We present EASE, a novel method for learning sentence embeddings via contrastive learning between sentences and their related entities. The advantage of using entity supervision is twofold: (1) entities have been shown to be a strong indicator of text semantics and thus should provide rich training signals for sentence embeddings; (2) entities are defined independently of languages and thus offer useful cross-lingual alignment supervision. We evaluate EASE against other unsupervised models both in monolingual and multilingual settings. We show that EASE exhibits competitive or better performance in English semantic textual similarity (STS) and short text clustering (STC) tasks and it significantly outperforms baseline methods in multilingual settings on a variety of tasks. Our source code, pre-trained models, and newly constructed multi-lingual STC dataset are available at https://github.com/studio-ousia/ease.

2020

pdf bib
Viable Threat on News Reading: Generating Biased News Using Natural Language Models
Saurabh Gupta | Hong Huy Nguyen | Junichi Yamagishi | Isao Echizen
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

Recent advancements in natural language generation has raised serious concerns. High-performance language models are widely used for language generation tasks because they are able to produce fluent and meaningful sentences. These models are already being used to create fake news. They can also be exploited to generate biased news, which can then be used to attack news aggregators to change their reader’s behavior and influence their bias. In this paper, we use a threat model to demonstrate that the publicly available language models can reliably generate biased news content based on an input original news. We also show that a large number of high-quality biased news articles can be generated using controllable text generation. A subjective evaluation with 80 participants demonstrated that the generated biased news is generally fluent, and a bias evaluation with 24 participants demonstrated that the bias (left or right) is usually evident in the generated articles and can be easily identified.

2018

pdf bib
Identifying Computer-Translated Paragraphs using Coherence Features
Hoang-Quoc Nguyen-Son | Huy H. Nguyen | Ngoc-Dung T. Tieu | Junichi Yamagishi | Isao Echizen
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

2015

pdf bib
Paraphrase Detection Based on Identical Phrase and Similar Word Matching
Hoang-Quoc Nguyen-Son | Yusuke Miyao | Isao Echizen
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation