Hongping Shu


2023

pdf bib
中医临床切诊信息抽取与词法分析语料构建及联合建模方法(On Corpus Construction and Joint Modeling for Clinical Pulse Feeling and Palpation Information Extraction and Lexical Analysis of Traditional Chinese Medicine)
Yaqiang Wang (王亚强) | Wen Jiang (蒋文) | Yongguang Jiang (蒋永光) | Hongping Shu (舒红平)
Proceedings of the 22nd Chinese National Conference on Computational Linguistics

“切诊是中医临床四诊方法中极具中医特色的疾病诊察方法,为中医临床辨证论治提供重要的依据,中医临床切诊信息抽取与词法分析研究具有重要的临床应用价值。本文首次开展了中医临床切诊信息抽取与词法分析语料构建及联合建模方法研究,以万余条中医临床记录为研究对象,提出了一种语料构建框架,分别制定了中医临床切诊信息抽取、中文分词和词性标注语料标注规范,形成了可支撑多任务联合建模的语料,语料最终的标注一致性达到0.94以上。基于同级多任务共享编码参数模型,探索了中医临床切诊信息抽取与词法分析联合建模方法,并验证了该方法的有效性。”

2022

pdf bib
一种非结构化数据表征增强的术后风险预测模型(An Unstructured Data Representation Enhanced Model for Postoperative Risk Prediction)
Yaqiang Wang (王亚强) | Xiao Yang (杨潇) | Xuechao Hao (郝学超) | Hongping Shu (舒红平) | Guo Chen (陈果) | Tao Zhu (朱涛)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“准确的术后风险预测对临床资源规划和应急方案准备以及降低患者的术后风险和死亡率具有积极作用。术后风险预测目前主要基于术前和术中的患者基本信息、实验室检查、生命体征等结构化数据,而蕴含丰富语义信息的非结构化术前诊断的价值还有待验证。针对该问题,本文提出一种非结构化数据表征增强的术后风险预测模型,利用自注意力机制,精巧的将结构化数据与术前诊断数据进行信息加权融合。基于临床数据,将本文方法与术后风险预测常用的统计机器学习模型以及最新的深度神经网络进行对比,本文方法不仅提升了术后风险预测的性能,同时也为预测模型带来了良好的可解释性。”

pdf bib
基于批数据过采样的中医临床记录四诊描述抽取方法(Four Diagnostic Description Extraction in Clinical Records of Traditional Chinese Medicine with Batch Data Oversampling)
Yaqiang Wang (王亚强) | Kailun Li (李凯伦) | Yongguang Jiang (蒋永光) | Hongping Shu (舒红平)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“中医临床记录四诊描述抽取对中医临床辨证论治的提质增效具有重要的应用价值,然而该抽取任务尚有待探索,类别分布不均衡是该任务的关键挑战之一。本文围绕该任务展开研究,构建了中医临床四诊描述抽取语料库;基于无标注中医临床记录微调通用预训练语言模型实现领域适应;利用小规模标注数据,采用批数据过采样算法,实现中医临床记录四诊描述抽取模型的训练。实验结果表明本文提出方法的总体性能均优于对比方法,与对比方法的最优结果相比,本文提出的方法将少见类别的抽取性能F1值平均提升了2.13%。”

2018

pdf bib
On Learning Better Embeddings from Chinese Clinical Records: Study on Combining In-Domain and Out-Domain Data
Yaqiang Wang | Yunhui Chen | Hongping Shu | Yongguang Jiang
Proceedings of the BioNLP 2018 workshop

High quality word embeddings are of great significance to advance applications of biomedical natural language processing. In recent years, a surge of interest on how to learn good embeddings and evaluate embedding quality based on English medical text has become increasing evident, however a limited number of studies based on Chinese medical text, particularly Chinese clinical records, were performed. Herein, we proposed a novel approach of improving the quality of learned embeddings using out-domain data as a supplementary in the case of limited Chinese clinical records. Moreover, the embedding quality evaluation method was conducted based on Medical Conceptual Similarity Property. The experimental results revealed that selecting good training samples was necessary, and collecting right amount of out-domain data and trading off between the quality of embeddings and the training time consumption were essential factors for better embeddings.