Hui Song


2022

pdf bib
Different Data, Different Modalities! Reinforced Data Splitting for Effective Multimodal Information Extraction from Social Media Posts
Bo Xu | Shizhou Huang | Ming Du | Hongya Wang | Hui Song | Chaofeng Sha | Yanghua Xiao
Proceedings of the 29th International Conference on Computational Linguistics

Recently, multimodal information extraction from social media posts has gained increasing attention in the natural language processing community. Despite their success, current approaches overestimate the significance of images. In this paper, we argue that different social media posts should consider different modalities for multimodal information extraction. Multimodal models cannot always outperform unimodal models. Some posts are more suitable for the multimodal model, while others are more suitable for the unimodal model. Therefore, we propose a general data splitting strategy to divide the social media posts into two sets so that these two sets can achieve better performance under the information extraction models of the corresponding modalities. Specifically, for an information extraction task, we first propose a data discriminator that divides social media posts into a multimodal and a unimodal set. Then we feed these sets into the corresponding models. Finally, we combine the results of these two models to obtain the final extraction results. Due to the lack of explicit knowledge, we use reinforcement learning to train the data discriminator. Experiments on two different multimodal information extraction tasks demonstrate the effectiveness of our method. The source code of this paper can be found in https://github.com/xubodhu/RDS.

2020

pdf bib
基于BERT的端到端中文篇章事件抽取(A BERT-based End-to-End Model for Chinese Document-level Event Extraction)
Hongkuan Zhang (张洪宽) | Hui Song (宋晖) | Shuyi Wang (王舒怡) | Bo Xu (徐波)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

篇章级事件抽取研究从整篇文档中检测事件,识别出事件包含的元素并赋予每个元素特定的角色。本文针对限定领域的中文文档提出了基于BERT的端到端模型,在模型的元素和角色识别中依次引入前序层输出的事件类型以及实体嵌入表示,增强文本的事件、元素和角色关联表示,提高篇章中各事件所属元素的识别精度。在此基础上利用标题信息和事件五元组的嵌入式表示,实现主从事件的划分及元素融合。实验证明本文的方法与现有工作相比具有明显的提升。