医疗知识图谱构建与应用

Abstract

ABSTRACT

In recent years, Internet-related technologies are developing rapidly, and peo-ple’s lifestyles are changing with the development of technolog ies. Medical and health issues have always been one of the most concerned issues in people’s lives. On one hand, naturally, there turn out to be more and more websites on the medical and health aspects, and medical information is becoming more and more abundant. On the other hand, with the development of electronic devices, the way that many hospitals record patient information has also been updated from on the traditional paper medical records to through the use of computer systems to store electronic medical records. The variety of information is too much and complex. It is difficult for people to catch the information they really need from heaps of medical infor-mation. Knowledge graphs provide an excellent solution for the management of knowledge. Knowledge in the medical field is specialized, complicated and con-fused. If information in the medical field can be organized in the form of a knowledge graph, it will be of great help to the further application of medical knowledge in people’s lives.

The construction of knowledge graphs in the medical field is an issue which people urgently want to solve. However, the professionalism of knowledge in the medical field brings a lot of inconvenience to the construction of the knowledge graph. First of all, there are not many labeled data containing medical expertise, and there is not enough data available for direct use. Unlabeled data contains more medical expertise but is not effectively utilized. Second, some concepts of certain words in the medical field are different from that in the general domain. It is not good to directly refer the way of constructing knowledge graphs in the general do-main to the medical field. In addition, most knowledge graphs related research fo-cuses on the relationship between one entity and another. However, for medical treatment, the attribute and the value of the medical attribute plays an important role in the analysis of medical cases, it will be better if the relationship of triples are well-organized in the knowledge graphs. This paper dis-cusses the method of extracting attribute knowledge and value of attribute knowledge in medical domain from unlabeled data, and solves the problem that it is

Abstract

hard to process complicated medical data automatically.

In order to overcome the trouble of natural language processing application in medical domain, this paper aims to explore a method of constructing knowledge graphs in medical field, and studies and implements a semi-supervised method for mining knowledge from medical information. Through using Bootstrapping algo-rithm and conditional random field (CRF) model, we constructed the dictionary of medical field entities, then designed a method to mine the data needed for con-structing knowledge graphs; tried to improve the information extraction perfor-mance with the method of training word vectors in medical domain; studied the method of combining different knowledge graph data; and tried to apply the knowledge graphs to solve practical problems.

Keywords:medical domain, knowledge graph, attribute extraction, word vector, combination of knowledge graphs

目录

摘要......................................................................................................................I ABSTRACT ........................................................................................................... II 第1章绪论. (1)

1.1本文的研究背景和意义 (1)

1.2国内外研究现状 (2)

1.2.1 知识图谱的研究进展 (2)

1.2.2 知识图谱已投入使用的产品 (3)

1.2.3自然语言处理领域相关研究 (6)

1.2.4 研究现状分析 (8)

1.3研究内容及章节安排 (8)

第2章医疗知识图谱构建方法研究 (10)

2.1引言 (10)

2.2知识图谱构建方法研究 (10)

2.2.1 知识图谱 (10)

2.2.2 知识图谱的构建流程 (12)

2.3医疗知识图谱的领域性研究 (14)

2.4医疗知识图谱存储策略研究 (16)

2.4.1 知识图谱与资源描述框架 (16)

2.4.2 知识图谱与数据库 (18)

2.4.3 医疗知识图谱存储策略研究 (20)

2.5本章小结 (20)

第3章医疗知识图谱元数据抽取方法研究 (21)

3.1引言 (21)

3.2非结构化医疗数据知识元数据抽取方法研究 (21)

3.2.1 非结构化医疗数据知识元数据抽取流程 (21)

3.2.2 基于Bootstrapping算法扩展词表 (22)

3.2.3 基于条件随机场模型的命名实体识别 (23)

3.2.4 医疗知识图谱构建工程细节研究 (25)

3.2.5 基于特征工程方法的属性识别 (28)

3.3医疗知识图谱元数据抽取实验及实验结果分析 (30)

3.3.1 实验数据 (30)

3.3.2 医疗领域实体词表构建扩充 (31)

3.3.3 医疗知识图谱元数据抽取结果 (32)

3.4本章小结 (34)

第4章知识图谱扩展与融合方法研究 (35)

4.1引言 (35)

4.2词向量模型 (35)

4.2.1 词的向量化表示 (35)

4.2.2 词向量的训练 (35)

4.2.3 word2vector模型 (37)

4.3医疗知识图谱元数据扩展抽取方法研究 (38)

4.3.1 基于特征工程方法的扩展抽取方法研究 (38)

4.3.2 基于同义词词库的扩展抽取方法研究 (39)

4.3.3 基于医疗领域词向量的扩展抽取方法研究 (40)

4.4知识图谱数据融合方法研究 (40)

4.5实验及实验结果分析 (41)

4.6本章小结 (42)

第5章医疗知识图谱的初步应用 (43)

5.1引言 (43)

5.2医疗知识图谱搜索查询方法研究 (43)

5.2.1 知识图谱的搜索任务介绍 (43)

5.2.2 SPARQL语言 (43)

5.2.3 基于SPARQL语言的搜索工具 (45)

5.3知识图谱构建工具研究 (46)

5.4本章小结 (48)

结论 (49)

参考文献 (51)

攻读硕士学位期间发表的论文 (57)

哈尔滨工业大学学位论文原创性声明和使用权限 (58)

致谢 (59)

相关主题
相关文档
最新文档