多属性领域前沿热点知识图谱构建

西南科技大学硕士研究生学位论文第II页

Abstract

With the explosive growth of information,there are a large number of unstructured and heterogeneous data on the network.These data are characterized by diverse types,large quantities,rich content,strong dynamics and large disorde,it has brought great challenges for providing users with accurate and rapid Access to hot topics in the field of discipline,development history,frontier areas and the overall knowledge structure,which may prone to cause"information loss"problem.

At present,most of the construction of hotspot knowledge maps in the field of disciplines is focused on the use of bibliometric methods to comprehensively study the development of scientific knowledge maps.It mainly describes the current status of the literature,high-yield authors,major institutions,and regions,analyzes the knowledge base in this field,and explores research hotspots and frontier issues in this field.However,the development trajectory and origin attribute relationship of hotspot knowledge in the field is of great significance for researchers to learn and understand hotspot knowledge,to sort out the development trend of hotspot knowledge,to trace the hotspots source in the field,and to find key historical figures in each subfield.Through the current literature read and research,there are no research to be found about this thesis topic research direction.Therefore,it is of great research significance and practical value to extract the origin attribute relationship of domain hotspot knowledge.In this paper,we carry out the following work on the construction of hotspot knowledge map in multi-attribute fields:

First of all,considering the popularization of documentary communication in social networks,a novel method for hot topic mining is proposed.The literature attributes are divided into the traditional and social attributes,we constructe the literature evaluation model in the social network environment,calculate literature attention,and excavate academic dissertation with the influence of social communication and frontier knowledge in the subject field;

Secondly,In order to better understand the evolutionary trends and research priorities of this discipline,and to understand and sort out the frontier hot topics in the field,we propose a method for extracting the relationship between origins and attributes of hot topics in the field.

Thirdly,this paper proposes the definition of the relationship between the origin of the domain hotspot knowledge.According to the definition,it establishes the reasoning model of the origin domain of the domain hotspot knowledge and constructs the accurate syntactic analysis mechanism for the

西南科技大学硕士研究生学位论文第III页

relationship expression of the hotspot knowledge origin in different domains and semantics,the origin feature word of hotspot knowledge concept is used to design knowledge origin attribute relationship model;

Fourthly,based on the research of the definition and reasoning model,this paper proposes the most recent syntax-dependent verb extraction method. Different method models are used to mine the origin of domain concept knowledge for different domain knowledge origin relationship models. Experiments show that this method has better experimental performance than similar relation extraction model,and can effectively find the origin attribute relations of hot domain knowledge.

Finally,we design and draw a top-level frontier hotspot knowledge map in the field of"artificial intelligence",which can effectively excavate the evolutionary system of knowledge collection under domain disciplines,identify heavy and difficult knowledge points,and clearly demonstrate the development of frontier hotspot knowledge and frontier knowledge concept in this field trajectories and movements of cutting-edge knowledge and frontier knowledge concepts in the field.The important historical figures in various sub-fields migrate to reveal the dynamic development laws in the field of knowledge,and provide practical and valuable references for subject research.

Keywords:Frontier hotspots;text classification;origin property relations;knowledge map; development track;character migration

TYPE OF DISSERTATION:Theoretical Research

西南科技大学硕士研究生学位论文第IV页

目录

1绪论 (1)

1.1研究背景 (1)

1.2研究意义 (2)

1.3国内外研究现状 (2)

1.3.1学科前沿热点挖掘研究现状 (2)

1.3.2领域前沿热点知识图谱研究现状 (4)

1.3.3分析与总结 (5)

1.4关于本论文 (6)

1.4.1研究目标 (6)

1.4.2研究内容 (6)

1.4.3研究方法 (7)

1.4.4研究重点及难点 (8)

1.4.5创新之处 (8)

1.5论文的章节安排 (8)

2领域前沿热点图谱构建相关理论及技术 (10)

2.1学科领域前沿热点挖掘 (10)

2.1.1文献计量与内容分析法 (10)

2.1.2主题模型 (11)

2.2领域前沿热点知识图谱研究 (12)

2.2.1命名实体识别研究 (13)

2.2.2关系抽取研究 (14)

2.3本章小结 (16)

3基于社会关注度的领域前沿热点挖掘 (17)

3.1概述 (17)

3.2研究框架 (18)

3.3文献热度模型构建 (19)

3.3.1数据获取 (19)

3.3.2相关性分析 (21)

3.3.3热度评价指标主题挖掘 (22)

3.3.4文献热度评价模型构建 (24)

西南科技大学硕士研究生学位论文第V页

3.4实验 (25)

3.4.1文献热度评价实验 (25)

3.4.2主题模型结果分析 (27)

3.4.3传统与社会网络媒体挖掘热点结果对比 (29)

3.4.4主题模型与共词分析挖掘热点结果对比 (30)

3.5本章小结 (31)

4领域前沿热点知识起源属性关系抽取 (33)

4.1概述 (33)

4.2领域前沿热点知识起源属性关系抽取 (33)

4.2.1领域前沿热点知识起源定义 (33)

4.2.2起源属性关系抽取框架 (34)

4.2.3领域前沿热点知识起源属性关系模式 (35)

4.2.4最近句法依赖动词抽取方法 (36)

4.2.5概率软逻辑模型挖掘方法 (37)

4.3实验 (39)

4.3.1实验设置及评估标准 (39)

4.3.2起源属性关系模式分类评价 (39)

4.3.3概率软逻辑模型特征词选择 (41)

4.3.4起源属性关系抽取结果 (42)

4.4本章小结 (45)

5多属性领域前沿热点知识图谱构建 (46)

5.1概述 (46)

5.2领域前沿热点知识图谱展示 (46)

5.3本章小结 (49)

结论 (51)

致谢 (53)

参考文献 (54)

攻读硕士学位期间发表的学术论文及研究成果 (60)

西南科技大学硕士研究生学位论文第1页1绪论

1.1研究背景

随着信息资源数量和种类急速增长,包括科技文献以及维基百科数据在内的信息资源知识数据迅猛增长,数据具有量大、内容丰富、复杂、无序等特点,为科研人员有序、准确地把握学科领域的前沿热点和演变带来了巨大挑战,容易导致“信息迷失”[1][2]。如何快速准确地把握学科领域的发展历史,挖掘其研究热点主题,了解其发展趋势,从而激励科研人员从大量的参考文献找到创新突破口并发掘出其潜在研究空间,便成为当前学科建设即将面临的一个新的重要课题。在这样的背景下,以提升用户准确有效地发现、获取和利用信息资源能力为目标的领域前沿热点知识图谱应运而生,从此,科研人员开始探索各种方法以便能更好地了解学科领域前沿热点知识发展轨迹、把握其前沿热点,跟踪前沿热点的起源和发掘新兴的研究领域等,这些工作又进一步推动了知识图谱理论及应用研究的开展和深入。

科技文献是信息资源的一个重要组成部分,是研究者从事科学研究的劳动成果的表现形式之一。但科技文献数量的急剧增长带来了机遇也带来了挑战,由于在科技文献资源中传统指标评估文章的学术影响力如影响因子和引用指数,缺乏有效突破科技文献在社交网络的影响力,导致信息资源的整合及内容展现不及时,难以提供高效的知识化、学科化服务,在很大程度上阻碍了用户对科技文献信息资源的全方位发现和利用。此外,传统的科技文献评估学术影响力以电子期刊、硕博论文、立项基金或引文信息等为数据源,采用词频分析[3][4]、共词分析[5][6]、多维尺度分析[7]和社会网络分析[8]等方法挖掘前沿热点,并利用已有的或自行开发的知识图谱可视化工具对热点进行分析[9]。这些研究方法主要通过分析文献的学术传播热度来挖掘学科领域热点[10-12],仅考虑了领域前沿热点知识在专业学术平台的影响力,忽视了科研文献作为传播对象在社会网络中的流行与传播,热点挖掘结果存在明显滞后,前瞻性较差等不足。相比于专业学术平台,信息在社会网络环境下往往传播速度更快、范围更广,更能实时地体现传播对象的冷热程度,充分保证学科热点的前沿性。因此,科研文献在社会网络中的传播影响力分析对挖掘学科前沿热点具有重要应用价值。

相关主题
相关文档
最新文档