Zhengtao Yu


2023

pdf bib
Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints
Ran Song | Shizhu He | Shengxiang Gao | Li Cai | Kang Liu | Zhengtao Yu | Jun Zhao
Findings of the Association for Computational Linguistics: ACL 2023

Multilingual Knowledge Graph Completion (mKGC) aim at solving queries in different languages by reasoning a tail entity thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages, its pretraining tasks cannot be directly aligned with the mKGC tasks. Moreover, the majority of KGs and PLMs currently available exhibit a pronounced English-centric bias. This makes it difficult for mKGC to achieve good results, particularly in the context of low-resource languages. To overcome previous problems, this paper introduces global and local knowledge constraints for mKGC. The former is used to constrain the reasoning of answer entities , while the latter is used to enhance the representation of query contexts. The proposed method makes the pretrained model better adapt to the mKGC task. Experimental results on public datasets demonstrate that our method outperforms the previous SOTA on Hits@1 and Hits@10 by an average of 12.32% and 16.03%, which indicates that our proposed method has significant enhancement on mKGC.

pdf bib
Non-parallel Accent Transfer based on Fine-grained Controllable Accent Modelling
Linqin Wang | Zhengtao Yu | Yuanzhang Yang | Shengxiang Gao | Cunli Mao | Yuxin Huang
Findings of the Association for Computational Linguistics: EMNLP 2023

Existing accent transfer works rely on parallel data or speech recognition models. This paper focuses on the practical application of accent transfer and aims to implement accent transfer using non-parallel datasets. The study has encountered the challenge of speech representation disentanglement and modeling accents. In our accent modeling transfer framework, we manage to solve these problems by two proposed methods. First, we learn the suprasegmental information associated with tone to finely model the accents in terms of tone and rhythm. Second, we propose to use mutual information learning to disentangle the accent features and control the accent of the generated speech during the inference time. Experiments show that the proposed framework attains superior performance to the baseline models in terms of accentedness and audio quality.

pdf bib
基于语音文本跨模态表征对齐的端到端语音翻译(End-to-end Speech Translation Based on Cross-modal Representation Alignment of Speech and Text)
Ling Zhou, Guojiang ang Dong | Zhengtao Yu | Shengxiang Gao | Wenjun Wang | Houli Ma | 国江 周 | 凌 董 | 正涛 余 | 盛祥 高 | 文君 王 | 候丽 马
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“端到端语音翻译需要解决源语言语音到目标语言文本的跨语言和跨模态映射,有限标注数据条件下,建立语音文本表征间的统一映射,缓解跨模态差异是提升语音翻译性能的关键。本文提出语音文本跨模态表征对齐方法,对语音文本表征进行多粒度对齐并进行混合作为并行输入,基于多模态表征的一致性约束进行多任务融合训练。在MuST-C数据集上的实验表明,本文所提方法优于现有端到端语音翻译跨模态表征相关方法,有效提升了语音翻译模型跨模态映射能力和翻译性能。”

pdf bib
基于离散化自监督表征增强的老挝语非自回归语音合成方法(A Discretized Self-Supervised Representation Enhancement based Non-Autoregressive Speech Synthesis Method for Lao Language)
Zijian Feng (冯子健) | Linqin Wang (王琳钦) | Shengxaing Gao (高盛祥) | Zhengtao Yu (余正涛) | Ling Dong (董凌)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“老挝语的语音合成对中老两国合作与交流意义重大,但老挝语语音发音复杂,存在声调、音节及音素等发音特性,现有语音合成方法在老挝语上效果不尽人意。基于注意力机制建模的自回归模型难以拟合复杂的老挝语语音,模型泛化能力差,容易出现漏字、跳字等灾难性错误,合成音频缺乏自然性和流畅性。本文提出基于离散化自监督表征增强的老挝语非自回归语音合成方法,结合老挝语的语言语音特点,使用老挝语音素粒度的标注时长信息构建非自回归架构声学模型,通过自监督学习的预训练语音模型来提取语音内容和声调信息的离散化表征,融入到声学模型中增强模型的语音生成能力,增强合成音频的流畅性和自然性。实验证明,本文方法合成音频达到了4.03的MOS评分,基于离散化自监督表征增强的非自回归建模方法,能更好的在声调、音素时长、音高等细粒度层面刻画老挝语的语音特性。”

pdf bib
融合多粒度特征的缅甸语文本图像识别方法(Burmese Language Recognition Method Fused with Multi-Granularity Features)
Enyu He (何恩宇) | Rui Chen (陈蕊) | Cunli Mao (毛存礼) | Yuxin Huang (黄于欣) | Shengxaing Gao (高盛祥) | Zhengtao Yu (余正涛)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“缅甸语属于东南亚低资源语言,缅甸语文本图像识别对开展缅甸语机器翻译等任务具有重要意义。由于缅甸语属于典型的字符组合型语言,一个感受野内存在多个字符嵌套,现有缅甸语识别方法主要是从字符粒度进行识别,在解码时会出现某些字符未能正确识别而导致局部乱码。考虑到缅甸语存在特殊的字符组合规则,本文提出了一种融合多粒度特征的缅甸语文本图像识别方法,将较细粒度的字符粒度和较粗粒度的字符簇粒度进行序列建模,然后将两种粒度特征序列进行融合后利用解码器进行解码。实验结果表明,该方法能够有效缓解识别结果乱码的现象,并且在人工构建的数据集上相比“VGG16+BiLSTM+Transformer”的基线模型识别准确率提高2.4%,达到97.35%。 "

2022

pdf bib
基于图文细粒度对齐语义引导的多模态神经机器翻译方法(Based on Semantic Guidance of Fine-grained Alignment of Image-Text for Multi-modal Neural Machine Translation)
Junjie Ye (叶俊杰) | Junjun Guo (郭军军) | Kaiwen Tan (谭凯文) | Yan Xiang (相艳) | Zhengtao Yu (余正涛)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“多模态神经机器翻译旨在利用视觉信息来提高文本翻译质量。传统多模态机器翻译将图像的全局语义信息融入到翻译模型,而忽略了图像的细粒度信息对翻译质量的影响。对此,该文提出一种基于图文细粒度对齐语义引导的多模态神经机器翻译方法,该方法首先跨模态交互图文信息,以提取图文细粒度对齐语义信息,然后以图文细粒度对齐语义信息为枢纽,采用门控机制将多模态细粒度信息对齐到文本信息上,实现图文多模态特征融合。在多模态机器翻译基准数据集Multi30K 英语→德语、英语→法语以及英语→捷克语翻译任务上的实验结果表明,论文提出方法的有效性,并且优于大多数最先进的多模态机器翻译方法。”

pdf bib
多特征融合的越英端到端语音翻译方法(A Vietnamese-English end-to-end speech translation method based on multi-feature fusion)
Houli Ma (马候丽) | Ling Dong (董凌) | Wenjun Wang (王文君) | Jian Wang (王剑) | Shengxiang Gao (高盛祥) | Zhengtao Yu (余正涛)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“语音翻译的编码器需要同时编码语音中的声学和语义信息,单一的Fbank或Wav2vec2语音特征表征能力存在不足。本文通过分析人工的Fbank特征与自监督的Wav2vec2特征间的差异性,提出基于交叉注意力机制的声学特征融合方法,并探究了不同的自监督特征和融合方式,加强模型对语音中声学和语义信息的学习。结合越南语语音特点,以Fbank特征为主、Pitch特征为辅混合编码Fbank表征,构建多特征融合的越-英语音翻译模型。实验表明,使用多特征的语音翻译模型相比单特征翻译效果更优,与简单的特征拼接方法相比更有效,所提的多特征融合方法在越-英语音翻译任务上提升了1.97个BLEU值。”

pdf bib
融入音素特征的英-泰-老多语言神经机器翻译方法(English-Thai-Lao multilingual neural machine translation fused with phonemic features)
Zheng Shen (沈政) | Cunli Mao (毛存礼) | Zhengtao Yu (余正涛) | Shengxiang Gao (高盛祥) | Linqin Wang (王琳钦) | Yuxin Huang (黄于欣)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“多语言神经机器翻译是提升低资源语言翻译质量的有效手段。由于不同语言之间字符差异较大,现有方法难以得到统一的词表征形式。泰语和老挝语属于具有音素相似性的低资源语言,考虑到利用语言相似性能够拉近语义距离,提出一种融入音素特征的多语言词表征学习方法:(1)设计音素特征表示模块和泰老文本表示模块,基于交叉注意力机制得到融合音素特征后的泰老文本表示,拉近泰老之间的语义距离;(2)在微调阶段,基于参数分化得到不同语言对特定的训练参数,缓解联合训练造成模型过度泛化的问题。实验结果表明在ALT数据集上,提出方法在泰-英和老-英两个翻译方向上,相比基线模型提升0.97和0.99个BLEU值。”

pdf bib
融合双重注意力机制的缅甸语图像文本识别方法(Burmese image text recognition method with dual attention mechanism)
Fengxiao Wang (王奉孝) | Cunli Mao (毛存礼) | Zhengtao Yu (余正涛) | Shengxiang Gao (高盛祥) | Huang Yuxin (黄于欣) | Fuhao Liu (刘福浩)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“由于缅甸语字符具有独特的语言编码结构以及字符组合规则,现有图像文本识别方法在缅甸语图像识别任务中无法充分关注文字边缘的特征,会导致缅甸语字符上下标丢失的问题。因此,本文基于Transformer框架的图像文本识别方法做出改进,提出一种融合通道和空间注意力机制的视觉关注模块,旨在捕获像素级成对关系和通道依赖关系,降低缅甸语图像中噪声干扰从而获得语义更完整的特征图。此外,在解码过程中,将基于多头注意力的解码单元组合为解码器,用于将特征序列转化为缅甸语文字。实验结果表明,该方法在自构的缅甸语图像文本识别数据集上相比Transformer识别准确率提高0.5%,达到95.3%。”

pdf bib
融合外部语言知识的流式越南语语音识别(Streaming Vietnamese Speech Recognition Based on Fusing External Vietnamese Language Knowledge)
Junqiang Wang (王俊强) | Zhengtao Yu (余正涛) | Ling Dong (董凌) | Shengxiang Gao (高盛祥) | Wenjun Wang (王文君)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“越南语为低资源语言,训练语料难以获取;流式端到端模型在训练过程中难以学习到外部大量文本中的语言知识,这些问题在一定程度上都限制了流式越南语语音识别模型的性能。因此,本文以越南语音节作为语言模型和流式越南语语音识别模型的建模单元,提出了一种将预训练越南语语言模型在训练阶段融合到流式语音识别模型的方法。在训练阶段,通过最小化预训练越南语语言模型和解码器的输出计算一个新的损失函数LAE D−LM ,帮助流式越南语语音识别模型学习一些越南语语言知识从而优化其模型参数;在解码阶段,使用孓孨孡孬孬孯孷 孆孵孳孩孯孮或者字孆孓孔技术再次融合预训练语言模型进一步提升模型识别率。实验结果表明,在孖孉孖孏孓数据集上,相比基线模型,在训练阶段融合语言模型可以将流式越南语语音识别模型的词错率提升嬲嬮嬴嬵嬥;在解码阶段使用孓孨孡孬孬孯孷 孆孵孳孩孯孮或字孆孓孔再次融合语言模型,还可以将模型词错率分别提升嬱嬮嬳嬵嬥和嬴嬮嬷嬵嬥。”

pdf bib
Decoupling Mixture-of-Graphs: Unseen Relational Learning for Knowledge Graph Completion by Fusing Ontology and Textual Experts
Ran Song | Shizhu He | Suncong Zheng | Shengxiang Gao | Kang Liu | Zhengtao Yu | Jun Zhao
Proceedings of the 29th International Conference on Computational Linguistics

Knowledge Graph Embedding (KGE) has been proposed and successfully utilized to knowledge Graph Completion (KGC). But classic KGE paradigm often fail in unseen relation representations. Previous studies mainly utilize the textual descriptions of relations and its neighbor relations to represent unseen relations. In fact, the semantics of a relation can be expressed by three kinds of graphs: factual graph, ontology graph, textual description graph, and they can complement each other. A more common scenario in the real world is that seen and unseen relations appear at the same time. In this setting, the training set (only seen relations) and testing set (both seen and unseen relations) own different distributions. And the train-test inconsistency problem will make KGE methods easiy overfit on seen relations and under-performance on unseen relations. In this paper, we propose decoupling mixture-of-graph experts (DMoG) for unseen relations learning, which could represent the unseen relations in the factual graph by fusing ontology and textual graphs, and decouple fusing space and reasoning space to alleviate overfitting for seen relations. The experiments on two unseen only public datasets and a mixture dataset verify the effectiveness of the proposed method, which improves the state-of-the-art methods by 6.84% in Hits@10 on average.

pdf bib
Noise-robust Cross-modal Interactive Learning with Text2Image Mask for Multi-modal Neural Machine Translation
Junjie Ye | Junjun Guo | Yan Xiang | Kaiwen Tan | Zhengtao Yu
Proceedings of the 29th International Conference on Computational Linguistics

Multi-modal neural machine translation (MNMT) aims to improve textual level machine translation performance in the presence of text-related images. Most of the previous works on MNMT focus on multi-modal fusion methods with full visual features. However, text and its corresponding image may not match exactly, visual noise is generally inevitable. The irrelevant image regions may mislead or distract the textual attention and cause model performance degradation. This paper proposes a noise-robust multi-modal interactive fusion approach with cross-modal relation-aware mask mechanism for MNMT. A text-image relation-aware attention module is constructed through the cross-modal interaction mask mechanism, and visual features are extracted based on the text-image interaction mask knowledge. Then a noise-robust multi-modal adaptive fusion approach is presented by fusion the relevant visual and textual features for machine translation. We validate our method on the Multi30K dataset. The experimental results show the superiority of our proposed model, and achieve the state-of-the-art scores in all En-De, En-Fr and En-Cs translation tasks.

2021

pdf bib
基于模型不确定性约束的半监督汉缅神经机器翻译(Semi-Supervised Chinese-Myanmar Neural Machine Translation based Model-Uncertainty)
Linqin Wang (王琳钦) | Zhengtao Yu (余正涛) | Cunli Mao (毛存礼) | Chengxiang Gao (高盛祥) | Zhibo Man (满志博) | Zhenhan Wang (王振晗)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

基于回译的半监督神经机器翻译方法在低资源神经机器翻译取得了明显的效果,然而,由于汉缅双语资源稀缺、结构差异较大,传统基于Transformer的回译方法中编码端的Self-attention机制不能有效区别回译中产生的伪平行数据的噪声对句子编码的影响,致使译文出现漏译,多译,错译等问题。为此,该文提出基于模型不确定性为约束的半监督汉缅神经机器翻译方法,在Transformer网络中利用基于变分推断的蒙特卡洛Dropout构建模型不确定性注意力机制,获取到能够区分噪声数据的句子向量表征,在此基础上与Self-attention机制得到的句子编码向量进行融合,以此得到句子有效编码表征。实验证明,本文方法相比传统基于Transformer的回译方法在汉语-缅甸语和缅甸语-汉语两个翻译方向BLEU值分别提升了4.01和1.88个点,充分验证了该方法在汉缅神经翻译任务的有效性。

pdf bib
基于中文信息与越南语句法指导的越南语事件检测(Vietnamese event detection based on Chinese information and Vietnamese syntax guidance)
Long Chen (陈龙) | Junjun Guo (郭军军) | Yafei Zhang (张亚飞) | Chengxiang Gao (高盛祥) | Zhengtao Yu (余正涛)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

当前基于深度学习的事件检测模型都依赖足够数量的标注数据,而标注数据的稀缺及事件类型歧义为越南语事件检测带来了极大的挑战。根据“表达相同观点但语言不同的句子通常有相同或相似的语义成分”这一多语言一致性特征,本文提出了一种基于中文信息与越南语句法指导的越南语事件检测框架。首先通过共享编码器策略和交叉注意力网络将中文信息融入到越南语中,然后使用图卷积网络融入越南语依存句法信息,最后在中文事件类型指导下实现越南语事件检测。实验结果表明,在中文信息和越南语句法的指导下越南语事件检测取得了较好的效果。

pdf bib
融合多层语义特征图的缅甸语图像文本识别方法(Burmese Image Text Recognition Method Fused with Multi-layer Semantic Feature Maps)
Fuhao Liu (刘福浩) | Cunli Mao (毛存礼) | Zhengtao Yu (余正涛) | Chengxiang Gao (高盛祥) | Linqin Wang (王琳钦) | Xuyang Xie (谢旭阳)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

由于缅甸语存在特殊的字符组合结构,在图像文本识别研究方面存在较大的困难,直接利用现有的图像文本识别方法识别缅甸语图片存在字符缺失和复杂背景下识别效果不佳的问题。因此,本文提出一种融合多层语义特征图的缅甸语图像文本识别方法,利用深度卷积网络获得多层图像特征并对其融合获取多层语义信息,缓解缅甸语图像中由于字符嵌套导致特征丢失的问题。另外,在训练阶段采用MIX UP的策略进行网络参数优化,提高模型的泛化能力,降低模型在测试阶段对训练样本产生的依赖。实验结果表明,提出方法相比基线模型准确率提升了2.2%。

pdf bib
基于阅读理解的汉越跨语言新闻事件要素抽取方法(News Events Element Extraction of Chinese-Vietnamese Cross-language Using Reading Comprehension)
Enchang Zhu (朱恩昌) | Zhengtao Yu (余正涛) | Chengxiang Gao (高盛祥) | Yuxin Huang (黄宇欣) | Junjun Guo (郭军军)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

新闻事件要素抽取旨在抽取新闻文本中描述主题事件的事件要素,如时间、地点、人物和组织机构名等。传统的事件要素抽取方法在资源稀缺型语言上性能欠佳,且对长文本语义建模困难。对此,本文提出了基于阅读理解的汉越跨语言新闻事件要素抽取方法。该方法首先利用新闻长文本关键句检索模块过滤含噪声的句子。然后利用跨语言阅读理解模型将富资源语言知识迁移到越南语,提高越南语新闻事件要素抽取的性能。在自建的汉越双语新闻事件要素抽取数据集上的实验证明了本文方法的有效性。

pdf bib
融合多粒度特征的低资源语言词性标记和依存分析联合模型(A Joint Model with Multi-Granularity Features of Low-resource Language POS Tagging and Dependency Parsing)
Sha Lu (陆杉) | Cunli Mao (毛存礼) | Zhengtao Yu (余正涛) | Chengxiang Gao (高盛祥) | Yuxin Huang (黄于欣) | Zhenhan Wang (王振晗)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

研究低资源语言的词性标记和依存分析对推动低资源自然语言处理任务有着重要的作用。针对低资源语言词嵌入表示,已有工作并没有充分利用字符、子词层面信息编码,导致模型无法利用不同粒度的特征,对此,提出融合多粒度特征的词嵌入表示,利用不同的语言模型分别获得字符、子词以及词语层面的语义信息,将三种粒度的词嵌入进行拼接,达到丰富语义信息的目的,缓解由于标注数据稀缺导致的依存分析模型性能不佳的问题。进一步将词性标记和依存分析模型进行联合训练,使模型之间能相互共享知识,降低词性标记错误在依存分析任务上的线性传递。以泰语、越南语为研究对象,在宾州树库数据集上,提出方法相比于基线模型的UAS、LAS、POS均有明显提升。

pdf bib
Semantic Relation-aware Difference Representation Learning for Change Captioning
Yunbin Tu | Tingting Yao | Liang Li | Jiedong Lou | Shengxiang Gao | Zhengtao Yu | Chenggang Yan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Rˆ3Net:Relation-embedded Representation Reconstruction Network for Change Captioning
Yunbin Tu | Liang Li | Chenggang Yan | Shengxiang Gao | Zhengtao Yu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Change captioning is to use a natural language sentence to describe the fine-grained disagreement between two similar images. Viewpoint change is the most typical distractor in this task, because it changes the scale and location of the objects and overwhelms the representation of real change. In this paper, we propose a Relation-embedded Representation Reconstruction Network (Rˆ3Net) to explicitly distinguish the real change from the large amount of clutter and irrelevant changes. Specifically, a relation-embedded module is first devised to explore potential changed objects in the large amount of clutter. Then, based on the semantic similarities of corresponding locations in the two images, a representation reconstruction module (RRM) is designed to learn the reconstruction representation and further model the difference representation. Besides, we introduce a syntactic skeleton predictor (SSP) to enhance the semantic interaction between change localization and caption generation. Extensive experiments show that the proposed method achieves the state-of-the-art results on two public datasets.

2020

pdf bib
基于多语言联合训练的汉-英-缅神经机器翻译方法(Chinese-English-Burmese Neural Machine Translation Method Based on Multilingual Joint Training)
Zhibo Man (满志博) | Cunli Mao (毛存礼) | Zhengtao Yu (余正涛) | Xunyu Li (李训宇) | Shengxiang Gao (高盛祥) | Junguo Zhu (朱俊国)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

多语言神经机器翻译是解决低资源神经机器翻译的有效方法,现有方法通常依靠共享词表的方式解决英语、法语以及德语相似语言之间的多语言翻译问题。缅甸语属于一种典型的低资源语言,汉语、英语以及缅甸语之间的语言结构差异性较大,为了缓解由于差异性引起的共享词表大小受限制的问题,提出一种基于多语言联合训练的汉英缅神经机器翻译方法。在Transformer框架下将丰富的汉英平行语料与汉缅、英缅的语料进行联合训练,模型训练过程中分别在编码端和解码端将汉英缅映射在同一语义空间降低汉英缅语言结构差异性对共享词表的影响,通过共享汉英语料训练参数来弥补汉缅数据缺失的问题。实验表明在一对多、多对多的翻译场景下,提出方法相比基线模型的汉-英、英-缅以及汉-缅的BLEU值有明显的提升。

pdf bib
基于跨语言双语预训练及Bi-LSTM的汉-越平行句对抽取方法(Chinese-Vietnamese Parallel Sentence Pair Extraction Method Based on Cross-lingual Bilingual Pre-training and Bi-LSTM)
Chang Liu (刘畅) | Shengxiang Gao (高盛祥) | Zhengtao Yu (余正涛) | Yuxin Huang (黄于欣) | Congcong You (尤丛丛)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

汉越平行句对抽取是缓解汉越平行语料库数据稀缺的重要方法。平行句对抽取可转换为同一语义空间下的句子相似性分类任务,其核心在于双语语义空间对齐。传统语义空间对齐方法依赖于大规模的双语平行语料,越南语作为低资源语言获取大规模平行语料相对困难。针对这个问题本文提出一种利用种子词典进行跨语言双语预训练及Bi-LSTM(Bi-directional Long Short-Term Memory)的汉-越平行句对抽取方法。预训练中仅需要大量的汉越单语和一个汉越种子词典,通过利用汉越种子词典将汉越双语映射到公共语义空间进行词对齐。再利用Bi-LSTM和CNN(Convolutional Neural Networks)分别提取句子的全局特征和局部特征从而最大化表示汉-越句对之间的语义相关性。实验结果表明,本文模型在F1得分上提升7.1%,优于基线模型。

pdf bib
基于拼音约束联合学习的汉语语音识别(Chinese Speech Recognition Based on Pinyin Constraint Joint Learning)
Renfeng Liang (梁仁凤) | Zhengtao Yu (余正涛) | Shengxiang Gao (高盛祥) | Yuxin Huang (黄于欣) | Junjun Guo (郭军军) | Shuli Xu (许树理)
Proceedings of the 19th Chinese National Conference on Computational Linguistics

当前的语音识别模型在英语、法语等表音文字中已经取得很好的效果。然而,汉语是 一种典型的表意文字,汉字与语音没有直接的对应关系,但拼音作为汉字读音的标注 符号,与汉字存在相互转换的内在联系。因此,在汉语语音识别中利用拼音作为解码 约束,引入一种更接近语音的归纳偏置。基于多任务学习框架,提出一种基于拼音约 束联合学习的汉语语音识别方法,以端到端的汉字语音识别为主任务,以拼音语音识 别为辅助任务,通过共享编码器,同时利用汉字与拼音识别结果作为监督信号,增强 编码器对汉语语音的表达能力。实验结果表明,相比基线模型,提出方法取得更优的 识别效果,词错误率WER降低了2.24个百分点

2012

pdf bib
Chinese Name Disambiguation Based on Adaptive Clustering with the Attribute Features
Wei Tian | Xiao Pan | Zhengtao Yu | Yantuan Xian | Xiuzhen Yang
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing