基于离散化自监督表征增强的老挝语非自回归语音合成方法(A Discretized Self-Supervised Representation Enhancement based Non-Autoregressive Speech Synthesis Method for Lao Language)

Zijian Feng (冯子健), Linqin Wang (王琳钦), Shengxaing Gao (高盛祥), Zhengtao Yu (余正涛), Ling Dong (董凌)


Abstract
“老挝语的语音合成对中老两国合作与交流意义重大,但老挝语语音发音复杂,存在声调、音节及音素等发音特性,现有语音合成方法在老挝语上效果不尽人意。基于注意力机制建模的自回归模型难以拟合复杂的老挝语语音,模型泛化能力差,容易出现漏字、跳字等灾难性错误,合成音频缺乏自然性和流畅性。本文提出基于离散化自监督表征增强的老挝语非自回归语音合成方法,结合老挝语的语言语音特点,使用老挝语音素粒度的标注时长信息构建非自回归架构声学模型,通过自监督学习的预训练语音模型来提取语音内容和声调信息的离散化表征,融入到声学模型中增强模型的语音生成能力,增强合成音频的流畅性和自然性。实验证明,本文方法合成音频达到了4.03的MOS评分,基于离散化自监督表征增强的非自回归建模方法,能更好的在声调、音素时长、音高等细粒度层面刻画老挝语的语音特性。”
Anthology ID:
2023.ccl-1.8
Volume:
Proceedings of the 22nd Chinese National Conference on Computational Linguistics
Month:
August
Year:
2023
Address:
Harbin, China
Editors:
Maosong Sun, Bing Qin, Xipeng Qiu, Jing Jiang, Xianpei Han
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
90–101
Language:
Chinese
URL:
https://aclanthology.org/2023.ccl-1.8
DOI:
Bibkey:
Cite (ACL):
Zijian Feng, Linqin Wang, Shengxaing Gao, Zhengtao Yu, and Ling Dong. 2023. 基于离散化自监督表征增强的老挝语非自回归语音合成方法(A Discretized Self-Supervised Representation Enhancement based Non-Autoregressive Speech Synthesis Method for Lao Language). In Proceedings of the 22nd Chinese National Conference on Computational Linguistics, pages 90–101, Harbin, China. Chinese Information Processing Society of China.
Cite (Informal):
基于离散化自监督表征增强的老挝语非自回归语音合成方法(A Discretized Self-Supervised Representation Enhancement based Non-Autoregressive Speech Synthesis Method for Lao Language) (Feng et al., CCL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.ccl-1.8.pdf