基于Fast-Newman算法的网络社团结构分解

合集下载

G-N算法解读

G-N算法解读
10
11
移除具有最高边界数的边
12
四、GN算法存在的问题
• 从上面的算法流程中我们可以看到:在不知道社区数 目的情况下,G-N 算法也不知道这种分解要进行到哪 一步为止。 • 为解决这个问题,Newman等人引进了一个衡量网络 划分质量的标准——模块度。模块度函数Q:
Q (eii ai )
3
二、G-N 算法的思想
• 流程如下:
1、计算网络中所有边的边介数。 2、找到边介数最高的边并将它从网络中移除。 3、重复步骤1,2,直到每个节点就是一个退化的社 区为止。
4
三、边介数定义和计算
• 最短路径边介数方法是一种最简单的边介数度量方法, 一条边的边介数(betweenness)是指从某个源节点 S 出发通过该边的最短路径的数目,对所有可能的源节 点,重复做同样的计算,并将得到的相对于各个不同 的源节点的边介数(betweenness)相加,所得的累加 和为该边相对于所有源节点的边介数。
22
• 步骤2:依次合并有边相连的社区对,并计算合并后的 模块度增量∆Q=eij+eji-2aij=2(eij-aiaj)。根据贪婪 算法的原理,每次合并应该沿着Q增大最多或者减小 最少的方向进行。每次合并以后,对相应的元素eij更 新,并将与i、j社区相关的行和列相加。 • 步骤3重复执行步骤2,不断合并社区,直到整个网络 都合并成为一个社区。最多要执行n-1次合并。整个算 法完成后可以得到一个社区结构分解的树状图。再通 过选择在不同位置断开可以得到不同的网络社区结构。 在这些社区结构中,选择一个对应着局部最大Q值的, 就得到最好的网络社区结构。
29
9
(0,1) 25/6 (1,1)
(1+1/3+1)*1

基于节点相似性度量的社团结构划分方法

基于节点相似性度量的社团结构划分方法

基于节点相似性度量的社团结构划分方法梁宗文;杨帆;李建平【摘要】针对复杂网络结构划分过程复杂、准确性差的问题,定义了节点全局和局部相似性衡量指标,并构建节点的相似性矩阵,提出一种基于节点相似性度量的社团结构划分算法.其基本思路是将节点(或社团)按相似性合并条件划分到同一个社团中,如果合并后的节点(或社团)仍然满足相似性合并条件,则继续合并,直到所有节点都得到准确的社团划分.实验结果表明,所提算法能成功正确地划分出真实网络中的社团结构,性能比标签传播算法(LPA)、GN (Girvan-Newman)、CNM (Clauset-Newman-Moore)等算法优秀,能有效提高结果的准确性和鲁棒性.【期刊名称】《计算机应用》【年(卷),期】2015(035)005【总页数】6页(P1213-1217,1223)【关键词】节点相似性;社团划分;社团结构;复杂网络【作者】梁宗文;杨帆;李建平【作者单位】电子科技大学计算机科学与工程学院,成都611731;Department of Electrical and Computer Engineering, Rutgers, The State University of New Jersey, Piscataway 08854, USA;电子科技大学计算机科学与工程学院,成都611731;电子科技大学计算机科学与工程学院,成都611731【正文语种】中文【中图分类】TP393;N945.1社团划分是研究复杂网络结构和功能属性最基本的一项工作。

社团可以看作是在网络中拥有相似属性或者扮演相似角色的节点集合,通常这些社团内部的节点之间连接紧密, 而社团之间的节点则连接稀疏。

学术界提出许多进行社团划分的算法,但大多数算法都是基于模块度指标来划分的,这种划分方法不但计算复杂而且还可能出现误差,所以出现了大量的改进方法,如GN(Girvan-Newman)算法[1]、谱方法[2]、标签传播算法(Label Propagation Algorithm, LPA)[3]、Kernighan-Lin 算法[4]等。

一种基于改进的Newman快速算法的文本聚类方法

一种基于改进的Newman快速算法的文本聚类方法
度, 实例验 证 了此方 法 的可行 性 。
根据 贪婪算 法 的原理 , 每次 合 并应 沿 着使 Q增
大最 多或者减 少最 小 的方 向进行 。该 步 的算 法 复杂
21 0 0年 8月 4 日收 到
度为 o m) ( 。每次 合 并 以后 , 应 的元 素 e更 新 , 对 q 并 将与 √社 团相关 的行和列相加 。该步 的算法 复杂度 为 0 n 。因此 , 步的算法复 杂度为 0 m+ ) () 第二 ( n。

2 1 SiT e. nn. 0 0 e eh E gg .

种基 于 改 进 的 N w n快 速 算 法 e ma 的文 本 聚类 方 法
安 娜 赵 继 广 刘 绍 海
( 装备指挥技术学院, 北京 1 11 ; 0 4 6 武警沈阳指挥学院 沈 阳 10 1 ) , 113
的算 法复 杂度 还是 比较 大 , 因此仅 仅 局 限于研 究 中
种凝 聚算 法 。算 法 如下 :
① 初 始化 网络 为 个社 团 , 即每 个节 点 就是一
个 独立 社 团 。初 始 的 e和 a 满 足 0 其 他 ,
等规 模 的 复杂 网 络 。文 本 聚类 中 的 网络 通 常 都 包 含几 百万个 以上 的节 点 , 在这 种 情 况 下 , 统 的 G 传 N
1 3 1 文本 向量 的 空 间模 型 ( S . . V M)

向量空间模型是由 Sln等人_ 在 2 ao t 4 0世纪 6 0
年代 提 出来 的 , 在 著 名 的 S r系 统 中实 现 。在 并 mat
向量空 间模 型 中 , 一 篇 文 档被 表 示 为规 范 化 正 交 每 特征 词矢 量 所 组 成 的空 间 中 的 一 个 点 。一 般 采 用 I F Iv r ou e t rq e c ) D (n es D c m n Fe u ny 来表 示 V M, : e S 即

用于社团发现的Girvan-Newman改进算法

用于社团发现的Girvan-Newman改进算法

用于社团发现的Girvan-Newman改进算法
朱小虎;宋文军;王崇骏;谢俊元
【期刊名称】《计算机科学与探索》
【年(卷),期】2010(004)012
【摘要】为了克服Girvan-Newman算法运行效率的不足,提出了一个基于modularity极值近似的社团发现算法MEA.该算法采用modularity增量作为社团结构的度量,使用贪心策略获得最优社团分划的近似解.通过理论分析,并在实际的数据集上进行实验验证,结果表明MEA算法是快速、有效的.
【总页数】8页(P1101-1108)
【作者】朱小虎;宋文军;王崇骏;谢俊元
【作者单位】南京大学,计算机软件新技术国家重点实验室,南京,210093;南京大学,计算机科学与技术系,南京,210093;南京大学,计算机软件新技术国家重点实验室,南京,210093;南京大学,计算机科学与技术系,南京,210093;南京大学,计算机软件新技术国家重点实验室,南京,210093;南京大学,计算机科学与技术系,南京,210093;南京大学,计算机软件新技术国家重点实验室,南京,210093;南京大学,计算机科学与技术系,南京,210093
【正文语种】中文
【中图分类】TP311
【相关文献】
1.利用改进遗传算法进行复杂网络社团发现 [J], 邓琨;张健沛;杨静
2.基于聚类算法的社团发现算法研究 [J], 王唐云
3.一种改进的基于社团发现的贝叶斯众包模型 [J], 吴昌钱;洪欣
4.无线城市社团发现的研究——在Spark上利用改进关联规则实现社团发现的算法 [J], 王永贵;徐山珊;肖成龙
5.复杂网络中的社团发现算法综述 [J], 李辉;陈福才;张建朋;吴铮;李邵梅;黄瑞阳因版权原因,仅展示原文概要,查看原文内容请购买。

基于中心节点和局部优化的复杂网络社团划分方法

基于中心节点和局部优化的复杂网络社团划分方法

基于中心节点和局部优化的复杂网络社团划分方法王建玺;黄淼【摘要】复杂网络的社团划分是复杂网络研究的重要分支,常用的社团挖掘算法主要分为全局社团挖掘算法和局部社团挖掘算法.针对全局社团挖掘算法难以适应规模和动态性愈来愈大的复杂网络的问题,提出一种基于中心节点和局部优化的复杂网络社团划分方法.首先综合判断节点的度中心性、介数中心性、接近中心性和相似度来确定社团的中心节点集,然后根据局部模块优度将节点划归至合适社团,再对孤立节点和重叠社团节点进行处理,最终实现复杂网络的社团划分.通过实验并与典型局部社团挖掘算法进行对比,所提方法可以实现社团的有效和准确划分.【期刊名称】《微处理机》【年(卷),期】2018(039)004【总页数】5页(P35-39)【关键词】中心节点;局部优化;社团划分;复杂网络;多属性判别【作者】王建玺;黄淼【作者单位】平顶山学院计算机学院,平顶山467000;平顶山学院计算机学院,平顶山467000【正文语种】中文【中图分类】TP3121 引言自然界与现实世界中的许多系统都可以用由节点和边组成的网络抽象地来表示,如社会关系网、流行病传播网、蛋白质交互网、Internet网络等,这些网络因其具有高复杂性被称为“复杂网络”。

复杂网络通常具有小世界效应和无标度特性这两个统计特性。

其中,“小世界效应”是指复杂网络具有短路径长度和高聚类系数的特点[1];“无标度特性”则是指复杂网络中的结点的度服从幂率分布特征[2]。

“社团结构”是复杂网络的一种重要结构特性,指的是社团中的点相互连接紧密,而这些点同社团外的点连接较为松散[3]。

社团的发现和划分对于分析网络的组织结构、功能划分和成员关系等有着重要的理论意义和实际价值。

近年来,随着社团结构研究不断深入,社团挖掘算法被主要分为全局社团挖掘算法和局部社团挖掘算法。

全局社团挖掘算法基于全网信息进行分析,如谱聚类算法[4]、GN 算法[5]、快速 Newman 算法[6],但目前复杂网络的规模愈来愈大、动态性愈来愈强,导致算法的普适性、复杂度和效率等有待改进。

复杂网络中社区发现算法的研究

复杂网络中社区发现算法的研究

复杂网络中社区发现算法的研究随着互联网的发展,人们对于网络的依赖越来越高,网络已经成为人们生活中不可或缺的一部分。

大量的信息被不断地产生和分享,网络中的各种社交网络也得以形成。

但是,随着网络的扩张,网络结构的复杂性也不断增加,这给社区发现带来了不小的挑战。

社区发现,即在网络中寻找具有相似特征的节点组成的集合,是研究网络结构的热门领域之一。

它广泛应用于社交网络、生物网络、互联网等各种领域。

社区发现的算法种类繁多,每种算法都有其独特的优点和不足。

目前,主流的社区发现算法主要分为以下几类:1. 基于连边的社区发现算法这种算法的基本思想是将网络中的边分成不同的子群,然后将同一子群中的节点分为同一社区。

以Girvan-Newman算法为例,其首先计算网络中所有边的介数(Betweenness)值,将介数值最大的边删除,再重新计算介数值,重复操作直到所有边都被删除,最终得到多个社区。

2. 基于聚类的社区发现算法这种算法将网络中的节点聚类成不同的组,要求同一组节点的相似度高于不同组节点的相似度。

常用方法有K-Means、DBSCAN、OPTICS等。

3. 基于模块度的社区发现算法这种算法的基本思想是通过计算网络中节点的聚集程度,将节点划分为不同的社区。

模块度算法是目前最为流行的基于模块度的社区发现算法。

虽然目前已有很多社区发现算法被广泛应用于各个领域,但是社区发现仍然存在很多挑战。

首先,网络结构的复杂性增加了社区发现算法的难度,使得一些算法只适用于特定类型的网络。

其次,现有的算法在处理较大规模的网络时计算效率较低。

最后,社区发现结果的可解释性似乎仍然不够理想。

为了解决这些问题,社区发现算法的研究需要深入探索以下方面:1. 改进社区发现算法一方面,需要对现有的算法进行改进,提高其适应各种类型网络的能力。

另一方面,还需要发展出更有效的算法,提高计算效率和社区可解释性。

2. 融合多种算法社区发现算法的精度往往与算法的类型有关。

一种基于节点重要度的社团划分算法

一种基于节点重要度的社团划分算法吴卫江;周静;李国和【摘要】This paper points out that through mining the society existed in complex networks, the topological structure and function of complex networks can be analyzed, and the hidden rules can be found either. In order to get the optimal community structure, node importance matrix and clustering matrix are defined, combined the spectral bisection method based on graph and modularity function, an community partition algorithm ( CDNIM ) based on node importance is proposed. This algorithm is applied in karate club, dolphin networks, and other classical data sets, the result of experiment shows that this algorithm can effectively improve the accuracy of discovering community structure.%指出了通过挖掘复杂网络中存在的社团结构,可以分析整个复杂网络的拓扑结构和功能,还可以发现网络中隐藏的规律。

为了得到最佳社团划分结构,定义了网络的节点重要度矩阵和聚类矩阵,结合图的特征谱平分法和模块度函数,提出了一种基于节点重要度的社团划分算法( CDNIM)。

复杂网络中的社团发现算法研究

复杂网络中的社团发现算法研究社群是指一个网络系统中相互有联系并有共同特征的节点集合。

在复杂网络中,社群发现算法是一种有助于理解和分析网络结构、挖掘隐藏关系的重要工具。

本文将探讨当前在复杂网络中的社群发现算法研究的最新进展和应用。

社群发现算法是通过识别节点之间的紧密关系和相似性,将网络分为若干相互连接紧密且内部联系紧密的社群。

这些社群可以代表特定的兴趣群体、组织结构或功能模块。

在真实世界的复杂网络中,如社交网络、生物网络、互联网等,社群发现对于发现隐含的社交圈、发现基因调控网络中的功能模块、发现互联网中的关键网页等具有重要意义。

最近,关于复杂网络中的社群发现算法的研究已经取得了重大进展。

不同的算法被开发出来,以应对不同类型的网络和不同的社群结构。

下面将介绍一些常见的社群发现算法。

1. 基于模块度的算法模块度是用来评估社群结构优劣的指标。

基于模块度的算法通过最大化网络内部联系的权重和最小化网络之间联系的权重,从而划分网络中的社群。

其中最著名的算法是Newman-Girvan算法,该算法通过逐步删除网络中的边缘连接来划分社群。

2. 谱聚类算法谱聚类算法是一种基于图论的聚类方法,通过将网络转化为图拉普拉斯矩阵,并应用特征值分解来划分社群。

谱聚类算法具有较强的鲁棒性和可扩展性,适用于大规模网络。

3. 层次聚类算法层次聚类是一种自底向上或自顶向下的聚类方法,通过合并或分割社群来构建层次关系。

层次聚类算法可以视网络为多个细分的子图,在每个层次上划分社群。

这些子图可以按照不同的社群结构进行划分,并且可以通过层次聚类的方法逐步合并。

除了以上列举的算法外,还有很多其他的社群发现算法,如基于密度的算法、基于标签传播的算法等。

这些算法各有特点,适用于不同类型的网络和不同的分析需求。

社群发现算法在许多领域具有广泛的应用。

在社交网络分析中,社群发现算法可以用于识别用户群体和社交圈子,推荐朋友、商品等。

在生物网络中,社群发现算法可以用于发现在基因调控中具有相似功能的基因模块,推动生物学研究。

复杂网络的一种快速局部社团划分算法

复杂网络的一种快速局部社团划分算法
解;汪小帆
【期刊名称】《计算机仿真》
【年(卷),期】2007(024)011
【摘要】为了快速准确地寻找大规模复杂网络的社团结构,文中基于节点度优先的思想,提出了一种新的寻找复杂网络中的局部社团结构的启发式算法.该算法的基本思想是从待求节点出发,基于节点的度有选择性的进行广度优先搜索,从而得到该节点所在的局部社团结构.由于该算法仅需要利用到节点的局部信息,因此时间复杂度很低,达到了线性的时间复杂度.将该算法应用于社会学中经典的Zachary网络,获得了满意的结果.最后,还分析了如何对该算法加以改进以进一步提高准确度.
【总页数】5页(P82-85,230)
【作者】解;汪小帆
【作者单位】上海交通大学自动化系,上海,200240;上海交通大学自动化系,上海,200240
【正文语种】中文
【中图分类】N94;TP393
【相关文献】
1.一种基于聚集系数的局部社团划分算法 [J], 李孔文;顾庆;张尧;陈道蓄
2.一种基于Newman快速算法改进的社团划分算法 [J], 付常雷
3.一种基于节点相似性的局部社团划分算法 [J], 王伟欣;刘发升
4.一种基于四元加权消减的复杂网络社团划分算法 [J], 邢雪;马杰良;安莉莉
5.基于节点向量表达的复杂网络社团划分算法 [J], 韩忠明;刘雯;李梦琪;郑晨烨;谭旭升;段大高
因版权原因,仅展示原文概要,查看原文内容请购买。

(2003)Fast_algorithm_for_detecting_community_structure_in_networks(FN算法)

a r X i v :c o n d -m a t /0309508v 1 [c o n d -m a t .s t a t -m e c h ] 22 S e p 2003Fast algorithm for detecting community structure in networksM. E.J.NewmanDepartment of Physics and Center for the Study of Complex Systems,University of Michigan,Ann Arbor,MI 48109–1120It has been found that many networks display community structure—groups of vertices within which connections are dense but between which they are sparser—and highly sensitive computer algorithms have in recent years been developed for detecting such structure.These algorithms however are computationally demanding,which limits their application to small networks.Here we describe a new algorithm which gives excellent results when tested on both computer-generated and real-world networks and is much faster,typically thousands of times faster than previous algorithms.We give several example applications,including one to a collaboration network of more than 50000physicists.I.INTRODUCTIONThere has in recent years been a surge of interest within the physics community in the properties of net-works of many kinds,including the Internet,the world wide web,citation networks,transportation networks,software call graphs,email networks,food webs,and so-cial and biochemical networks [1,2,3,4].One prop-erty that has attracted particular attention is that of “community structure”:the vertices in networks are of-ten found to cluster into tightly-knit groups with a high density of within-group edges and a lower density of between-group edges.Girvan and Newman [5,6]pro-posed a computer algorithm based on the iterative re-moval of edges with high “betweenness”scores that ap-pears to identify such structure with some sensitivity,and this algorithm has been employed by a number of authors in the study of such diverse systems as networks of email messages,social networks of animals,collabo-rations of jazz musicians,metabolic networks,and gene networks [5,6,7,8,9,10,11].As pointed out by New-man and Girvan [6],the principle disadvantage of their algorithm is the high computational demands it makes.In its simplest and fastest form it runs in worst-case time O(m 2n )on a network with m edges and n vertices,or O(n 3)on a sparse graph (one for which m scales with n in the limit of large n ,which covers essentially all networks of current scientific interest,with the possible exception of food webs).With typical computer resources available at the time of writing,this limits the algorithm’s use to networks of a few thousand vertices at most,and sub-stantially less than this for interactive applications.In-creasingly however,there is interest in the study of much larger networks;citation and collaboration networks can contain millions of vertices [12,13]for example,while the world wide web numbers in the billions [14].In this paper,therefore,we propose a new algorithm for detecting community structure.The algorithm oper-ates on different principles to that of Girvan and New-man (GN)but,as we will show,gives qualitatively similar results.The worst-case running time of the algorithm is O((m +n )n ),or O(n 2)on a sparse graph.In practice,it runs to completion on current computers in reason-able times for networks of up to a million or so vertices,bringing within reach the study of communities in many systems that would previously have been considered in-tractable.II.THE ALGORITHMQ =i(e ii −a 2i )(1)ing community structure.If a high value of Q represents a good community division,why not simply optimize Q over all possible divisions to find the best one?By do-ing this,we can avoid the iterative removal of edges and cut straight to the chase.The problem is that true op-timization of Q is very costly.The number of ways to divide n vertices into g non-empty groups is given bythe Stirling number of the second kind S (g )n ,and hencethe number of distinct community divisions is n g =1S (g )n .This sum is not known in closed form,but we observethat S (1)n +S (2)n =2n −1for all n >1,so that the sum must increase at least exponentially in n .To carry out2an exhaustive search of all possible divisions for the op-timal value of Q would therefore take at least an expo-nential amount of time,and is in practice infeasible for systems larger than twenty or thirty vertices.Various approximate optimization methods are available:simu-lated annealing,genetic algorithms,and so forth.Here we consider a scheme based on a standard “greedy”op-timization algorithm,which appears to perform well.Our algorithm falls in the generalcategoryof agglom-erative hierarchical clustering methods [15,16].Starting with a state in which each vertex is the sole member of one of n communities,we repeatedly join communities together in pairs,choosing at each step the join that re-sults in the greatest increase (or smallest decrease)in Q .The progress of the algorithm can be represented as a be at most m ,where m is again the number of edges in the graph.The change in Q upon joining two communi-ties is given by ∆Q =e ij +e ji −2a i a j =2(e ij −a i a j ),which can clearly be calculated in constant time.Fol-lowing a join,some of the matrix elements e ij must be updated by adding together the rows and columns corre-sponding to the joined communities,which takes worst-case time O(n ).Thus each step of the algorithm takes worst-case time O(m +n ).There are a maximum of n −1join operations necessary to construct the complete dendrogram and hence the entire algorithm runs in time O((m +n )n ),or O(n 2)on a sparse graph.The algorithm has the added advantage of calculating the value of Q as it goes along,making it especially simple to find the optimal community structure.It is worth noting that our algorithm can be general-ized trivially to weighted networks in which each edge has a numeric strength associated with it,by making the initial values of the matrix elements e ij equal to those strengths,rather than just zero or one;otherwise the algorithm is as above and has the same running time.The networks studied in this paper however are all un-weighted.III.APPLICATIONSAs a first example of the working of our algorithm,we have generated using a computer a large number of ran-dom graphs with known community structure,which we then run through the algorithm to quantify its perfor-mance.Each graph consists of n =128vertices divided into four groups of 32.Each vertex has on average z innumber of inter-community edges per vertex z outf r a c t i o n o f v e r t i c e s c l a s s i f i e d c o r r e c t l yFIG.1:The fraction of vertices correctly identified by our al-gorithms in the computer-generated graphs described in the text.The two curves show results for the new algorithm (circles)and for the algorithm of Girvan and Newman [5](squares).Each point is an average over 100graphs.edges connecting it to members of the same group and z out edges to members of other groups,with z in and z out chosen such that the total expected degree z in +z out =16,in this case.As z out is increased from small values,the resulting graphs pose greater and greater challenges to the community-finding algorithm.In Fig.1we show the fraction of vertices correctly assigned to the four commu-nities by the algorithm as a function of z out .As the figure shows,the algorithm performs well,correctly identifying more than 90%of vertices for values of z out <∼6.Only when z out approaches the value 8at which the number of within-and between-community edges per vertex is the same does the algorithm begin to fail.On the same plot we also show the performance of the GN algorithm and,as we can see,that algorithm performs slightly but mea-surably better for smaller values of z out .For example,for z out =5our new algorithm correctly identifies an av-erage of 97.4(2)%of vertices,while the older algorithm correctly identifies 98.9(1)%.Both,however,clearly per-form well.Interestingly for higher values of z out our new algo-rithm performs better than the older one,and we have come across a few real-world networks in which this is the case also.Normally,however,the GN algorithm seems to have the edge,and this should come as no great sur-prise.Our new algorithm bases its decisions on purely lo-cal information about individual communities,while the GN algorithm uses non-local information about the entire network—information derived from betweenness scores.Since community structure is itself fundamentally a non-local quantity,it seems reasonable that one can do a bet-ter job of finding that structure if one has non-local in-formation at one’s disposal.For systems small enough that the GN algorithm is computationally tractable,therefore,we see no reasonFIG.2:Dendrogram of the communities found by our algo-rithm in the“karate club”network of Zachary[5,17].The shapes of the vertices represent the two groups into which the club split as the result of an internal dispute.not to continue using it—it appears to give the best results.For systems too large to make use of this ap-proach,however,our new algorithm gives useful com-munity structure information with comparatively little effort.We have applied our algorithm to a variety of real-world networks also.We have looked,for example,at the“karate club”network studied in[5],which represents friendships between34members of a club at a US univer-sity,as recorded over a two-year period by Zachary[17]. During the course of the study,the club split into two groups as a result of a dispute within the organization, and the members of one group left to start their own club.In Fig.2we show the dendrogram derived by feed-ing the friendship network into our algorithm.The peak modularity is Q=0.381and corresponds to a split into two groups of17,as shown in thefigure.The shapes of the vertices represent the alignments of the club mem-bers following the split and,as we can see,the division found by the algorithm corresponds almost perfectly to these alignments;only one vertex,number10,is classified wrongly.The GN algorithm performs similarly on this task,but not better—it alsofinds the split but classifies one vertex wrongly(although a different one,vertex3). In other tests,wefind that our algorithm also success-fully detects the main two-way division of the dolphin social network of Lusseau[6,18],and the division be-tween black and white musicians in the jazz network of Gleiser and Danon[11].As a demonstration of how our algorithm can some-times miss some of the structure in a network,we take another example from Ref.5,a network representing the schedule of games between American college foot-ball teams in a single season.Because the teams are di-vided into groups or“conferences,”with intra-conference games being more frequent than inter-conference games, we have a reasonable idea ahead of time about what com-munities our algorithm shouldfind.The dendrogram generated by the algorithm is shown in Fig.3,and has an optimal modularity of Q=0.546,which is a little shy of the value0.601for the best split reported in[5].As the dendrogram reveals,the algorithmfinds six commu-nities.Some of them correspond to single conferences, but most correspond to two or more.The GN algorithm, by contrast,finds all eleven conferences,as well as accu-rately identifying independent teams that belong to no conference.Nonetheless,it is clear that the new algo-rithm is quite capable of picking out useful community structure from the network,and of course it is much the faster algorithm.On the author’s personal computer the algorithm ran to completion in an unmeasureably small time—less than a hundredth of a second.The algorithm of Girvan and Newman took a little over a second.A time difference of this magnitude will not present a big problem in most practical situations,but perfor-mance rapidly becomes an issue when we look at larger networks;we expect the ratio of running times to in-crease with the number of vertices.Thus,for example, in applying our algorithm to the1275-node network of jazz musician collaborations mentioned above,we found that it runs to completion in about one second of CPU time.The GN algorithm by contrast takes more than three hours to reach very similar results.As an example of an analysis made possible by the speed of the new algorithm,we have looked at a network of collaborations between physicists as documented by papers posted on the widely-used Physics E-print Archive at .The network is an updated version of the one described in Ref.13,in which scientists are consid-ered connected if they have coauthored one or more pa-pers posted on the archive.We analyze only the largest component of the network,which contains n=56276sci-entists in all branches of physics covered by the archive. Since two vertices that are unconnected by any path are never put in the same community by our algorithm,the small fraction of vertices that are not part of the largest component can safely be assumed to be in separate com-munities in the sense of our algorithm.Our algorithm takes42minutes tofind the full community structure. Our best estimates indicate that the GN algorithm would take somewhere between three andfive years to complete its version of the same calculation.The analysis reveals that the network in question con-sists of about600communities,with a high peak modu-larity of Q=0.713,indicating strong community struc-ture in the physics world.Four of the communities found are large,containing between them77%of all the ver-tices,while the others are small—see Fig.4,left panel. The four large communities correspond closely to subject subareas:one to astrophysics,one to high-energy physics, and two to condensed matter physics.Thus there ap-pears to be a strong correlation between the structure found by our algorithm and the community divisions per-ceived by human observers.It is precisely correlation of this kind that makes community structure analysis a useful tool in understanding the behavior of networked systems.We can repeat the analysis with any of the subcom-Boise StateIdaho Arkansas State New Mexico Baylor Texas Kansas Colorado Nebraska Iowa Utah State North Texas New Mexico State Texas Tech Oklahoma Texas A&M Kansas State Missouri Iowa State Texas El Paso Oklahoma State Wake Forest North Carolina State North Carolina Duke Navy Rutgers West Virginia Virginia Tech Syracuse Brigham Young ClemsonVirginia Georgia Tech Maryland Florida State Temple Miami Florida Pittsburgh Boston College Notre Dame Air Force Northern IllinoisBall State Eastern Michigan Ohio Akron Marshall Bowling Green State Toledo Western Michigan Central Michigan Kent Buffalo Miami Ohio Illinois Michigan Purdue Indiana Wisconsin Ohio StateMichigan State Penn State Northwestern Minnesota ArizonaWashington Washington State UCLA California Utah Colorado State Nevada Las Vegas Fresno State Hawaii Rice Kentucky Oregon State Stanford Oregon Arizona State San Diego State Wyoming Nevada San Jose State Texas Christian Tulsa Southern California Georgia Kentucky Mississippi Arkansas Louisiana State Cincinnati Army Tulane Southern Mississippi Alabama Birmingham Louisiana Tech Central Florida ConnecticutMiddle Tennessee State Louisiana Monroe Louisiana Lafayette Louisville East Carolina Memphis Houston Tennessee Alabama Mississippi State Auburn Florida Vanderbilt South Carolina FIG.3:Dendrogram of the communities found in the college football network described in the text.The real-worldcommunities—conferences—are denoted by the different shapes as indicated in the legend.munities to observe how they break up.For example,feeding the smaller of the two condensed-matter groups through the algorithm again,we find an even stronger peak modularity of Q =0.807—the strongest we have yet observed in any network—corresponding to a split into about a hundred communities of all sizes (Fig.4,center panel).The sizes appear roughly to have a power-law distribution with exponent about −2[20].Narrowing our focus still further to the particular one of these com-munities that contains the present author,we find the structure shown in the right panel of Fig.4.Feeding this one last time through the algorithm,it breaks apart into communities that correspond closely to individual insti-tutional research groups,the author’s group appearing in the corner of the figure,highlighted by the dashed box.One could pursue this line of analysis further,identify-ing individual groups,iteratively breaking them down,and looking for example at the patterns of collaboration between them,but we leave this for later studies.IV.CONCLUSIONSIn this paper we have described a new algorithm for ex-tracting community structure from networks,which hasa considerable speed advantage over previous algorithms,running to completion in time that scales as the square of the network size.This allows us to study much larger systems than has previously been possible.Among other examples,we have applied the algorithm to a network of collaborations between more than fifty thousand physi-cists,and found that the resulting community structure corresponds closely to the traditional divisions between specialties and research groups in the field.We believe that our method will not only allow for the extension of community structure analysis to some of the very large networks that are now being studied for the first time,but also provides a useful tool for visualizing and understand the structure of these networks,whose daunting size has hitherto made many of their structural properties obscure.AcknowledgmentsThe author thanks Leon Danon,Pablo Gleiser,David Lusseau,and Douglas White for providing network data used in the examples.This work was supported in part by the National Science Foundation under grant number DMS–0234188.FIG.4:Left panel:Community structure in the collaboration network of physicists.The graph breaks down into four large groups,each composed primarily to physicists of one specialty,as shown.Specialties are determined by the subsection(s) of the e-print archive in which individuals post papers:“C.M.”indicates condensed matter;“H.E.P.”high-energy physics including theory,phenomenology,and nuclear physics;“astro”indicates astrophysics.Middle panel:one of the condensed matter communities is further broken down by the algorithm,revealing an approximate power-law distribution of community sizes.Right panel:one of these smaller communities is further analyzed to reveal individual research groups(different shades), one of which(in dashed box)is the author’s own.[1]S.H.Strogatz,Exploring complex networks.Nature410,268–276(2001).[2]R.Albert and A.-L.Barab´a si,Statistical mechanics ofcomplex networks.Rev.Mod.Phys.74,47–97(2002). [3]S.N.Dorogovtsev and J.F.F.Mendes,Evolution of Net-works:From Biological Nets to the Internet and WWW.Oxford University Press,Oxford(2003).[4]M.E.J.Newman,The structure and function of complexnetworks.SIAM Review45,167–256(2003).[5]M.Girvan and M.E.J.Newman,Community structurein social and biological networks.Proc.Natl.Acad.Sci.USA99,7821–7826(2002).[6]M. E.J.Newman and M.Girvan,Finding andevaluating community structure in networks.Preprint cond-mat/0308217(2003).[7]D.Wilkinson and B.A.Huberman,Finding communitiesof related genes.Preprint cond-mat/0210147(2002). [8]P.Holme,M.Huss,and H.Jeong,Subnetwork hierar-chies of biochemical pathways.Bioinformatics19,532–538(2003).[9]R.Guimer`a,L.Danon,A.D´ıaz-Guilera,F.Giralt,andA.Arenas,Self-similar community structure in organisa-tions.Preprint cond-mat/0211498(2002).[10]J.R.Tyler, D.M.Wilkinson,and B. A.Huber-man,Email as spectroscopy:Automated discovery of community structure within organizations.Preprint cond-mat/0303264(2003).[11]P.Gleiser and L.Danon,Community structure in jazz.Preprint cond-mat/0307434(2003).[12]S.Redner,How popular is your paper?An empiricalstudy of the citation distribution.Eur.Phys.J.B4, 131–134(1998).[13]M.E.J.Newman,The structure of scientific collabora-tion A98,404–409 (2001).[14]J.Kleinberg and wrence,The structure of the Web.Science294,1849–1850(2001).[15]B.Everitt,Cluster Analysis.John Wiley,New York(1974).[16]J.Scott,Social Network Analysis:A Handbook.SagePublications,London,2nd edition(2000).[17]W.W.Zachary,An informationflow model for conflictandfission in small groups.Journal of Anthropological Research33,452–473(1977).[18]D.Lusseau,The emergent properties of a dolphin so-cial network.Biology Letters,Proc.R.Soc.London B (suppl.)(2003).DOI10.1098/rsbl.2003.0057.[19]As discussed in[6],each edge should contribute only toe ij once,either above or below the diagonal,but notboth.Alternatively,and more elegantly,one can split the contribution of each edge half-and-half between e ij ande ji,except for those edges that join a group to itself,whose contribution belongs entirely to the single diagonal element e ii for the group in question.[20]This power law is of a different kind to the one observedin an email network by Guimer`a et al.[9].They stud-ied the histogram of community sizes over all levels of the dendrogram;we are looking only at the single level corresponding to the maximum value of Q.。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
相关文档
最新文档