基于语音文本跨模态表征对齐的端到端语音翻译(End-to-end Speech Translation Based on Cross-modal Representation Alignment of Speech and Text)

Ling Zhou, Guojiang ang Dong, Zhengtao Yu, Shengxiang Gao, Wenjun Wang, Houli Ma, 国江 周, 凌 董, 正涛 余, 盛祥 高, 文君 王, 候丽 马


Abstract
“端到端语音翻译需要解决源语言语音到目标语言文本的跨语言和跨模态映射,有限标注数据条件下,建立语音文本表征间的统一映射,缓解跨模态差异是提升语音翻译性能的关键。本文提出语音文本跨模态表征对齐方法,对语音文本表征进行多粒度对齐并进行混合作为并行输入,基于多模态表征的一致性约束进行多任务融合训练。在MuST-C数据集上的实验表明,本文所提方法优于现有端到端语音翻译跨模态表征相关方法,有效提升了语音翻译模型跨模态映射能力和翻译性能。”
Anthology ID:
2023.ccl-1.7
Volume:
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
Month:
August
Year:
2023
Address:
Harbin, China
Editors:
Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
78–89
Language:
Chinese
URL:
https://aclanthology.org/2023.ccl-1.7
DOI:
Bibkey:
Cite (ACL):
Ling Zhou, Guojiang ang Dong, Zhengtao Yu, Shengxiang Gao, Wenjun Wang, Houli Ma, 国江 周, 凌 董, 正涛 余, 盛祥 高, 文君 王, and 候丽 马. 2023. 基于语音文本跨模态表征对齐的端到端语音翻译(End-to-end Speech Translation Based on Cross-modal Representation Alignment of Speech and Text). In Proceedings of the 22nd Chinese National Conference on Computational Linguistics, pages 78–89, Harbin, China. Chinese Information Processing Society of China.
Cite (Informal):
基于语音文本跨模态表征对齐的端到端语音翻译(End-to-end Speech Translation Based on Cross-modal Representation Alignment of Speech and Text) (Zhou, Guojiang ang Dong et al., CCL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ccl-1.7.pdf