POWER6系统技术分析
高级POWERSWOT分析法

为什么需要运用高级SWOT分析法?SWOT分析法是通过分析优势、劣势、机会与威胁来监测公司的市场营销方法。
我们的导论课程可以使你在初次接触市场营销工具的时候掌握基本的SWOT分析法。
但是,当你逐步领会SWOT分析法的时候,你会发现它有许多的局限性。
在运用SWOT分析法的过程中,你或许会碰到一些问题,这就是它的适应性。
因为有太多的场合可以运用SWOT分析法,所以它必须具有适应性。
然而这也会导致反常现象的产生。
基础SWOT法分析法所产生的问题可以由更高级的POWER SWOT分析法得到解决。
POWER是个人(Personal)、经验(Experience)、规则(Order)、加权(Weighting)、重视细节(Emphasize detail)、等级与优先(Rank and prioritize)的首字母缩写,这就是所谓的高级SWOT分析法。
P=个人经验(Personal experience)作为市场营销经理,你是如何运用SWOT分析法的呢?无非是将你的经验、技巧、知识、态度与信念结合起来。
你的洞察力与自觉将会对SWOT分析法产生影响。
O=规则-优势或劣势,机会或威胁市场营销经理经常会不由自主地把机会与优势、劣势与威胁的顺序搞混。
这是因为内在优势与劣质和外在机会与威胁之间的分界线很难鉴定。
举个例子,就说全球气温变暖与气温变化好了,人们会错将环境保护主义当作一种威胁而非潜在的机会。
W=加权(Weighting)通常人们不会将SWOT分析法所包含的各种要素进行加权。
一些要素肯定会比其他的要素更具争议性,因此你需要将所有的要素进行加权从而辨别出轻重缓急。
你可以采用百分比的方法,比如所威胁A=10%,威胁B=70%,威胁C =20%(总威胁为100%)。
E=重视细节(Emphasize detail)SWOT分析法通常会忽略细节、推理和判断。
人们想要寻找的往往是分析列表里面的几个单词而已。
比如说,在机会列表里人们就可能会看到“技术”这个单词。
IBM服务器分类介绍

p系列服务器型号
505 (机架安装型) 505Q (机架式) 510(机柜型) 510+ (机架安装型) 510Q(机架安装型) 520(机柜型/桌边型) 520+(机柜型/桌边型) 520Q 550+(机柜型/桌边型) 550Q(机柜型/桌边型) 560Q(机架安装) 570(机柜式)
p系列服务器配置及报价
520Q
处理器4 核的 IBM POWER5+? 时钟频率(最低/最高)1.5/1.65 GHz系统内存(标准/最大) 1GB / 32GB 内部存储器(标准/最大)73.4GB / 16.8TB(拥有可选的 I/O 扩展笼)性能 (相对性能范围)18.75/20.25
550+(机柜型/桌边型)
560Q(机架安装)
处理器4 核、8 核或 16 核的 POWER5+ 时钟频率(最低/最高)1.5 GHz / 1.8 GHZ系统内存(标准/最大)2GB / 128GB 内部存储器(标准/最大)73.4GB / 32.4TB(通过可选的 I/O 抽屉1)性能(相对性能范围)18.75 - 75.58
570(机柜式)
处理器64 位 POWER5 时钟频率(最低/最高)1.50GHz / 1.65GHz/1.90GHz 系 统内存(标准/最大)2GB/512GB 内部存储器(标准/最大)36.4GB/38.7TB 性能(相对性能范围)9.86 到 77.45
IBM x系列服务器
面向 Windows® 和 Linux® 系统通过高 性能的英特尔至强处理器及 AMD 处理器 提供卓越性能
处理器64 位 POWER5 时钟频率(最低/最高)1.65GHz/1.5GHz 系统内存(标准/最大) 1GB/32GB 内部存储器(标准/最大)36.4GB/8.2TB 性能(相对性能范围)**9.86
力控pNetPower 6

力控pNetPower 6.0—ECS电厂电气监控系统概述电气监控管理系统(Electric Control System),是将原来的DCS系统中的电气部分独立出来进行专业管理,实现厂用电中低压电气系统的保护、测量、控制、分析等综合功能。
协调发电厂热控与电气自动化的同步发展,全面提高发电厂的自动化水平和厂用控制管理水平,保证发电厂运行的安全性和可靠性,增强发电厂在当前电力市场经济运行的优势和竞争能力。
作为DCS的一个子集,为SIS (厂级监控系统)和MIS(电厂管理信息系统)提供更为丰富的信息。
ECS是应电力系统自动化水平的进一步提高而提出的。
1.1系统特点pNetPower6.0—ECS电厂电气监控系统是为推进发电厂厂用电气自动化技术的发展而研制的新型综合自动化系统。
该系统继承了微机保护、监控和综合自动化产品多年发展的技术精华,采用分层分布式架构,在先进的计算机技术和网络通信技术平台上,实现发电机组、厂用电和网络控制部分的监测、控制、调节、保护和远动功能。
pNetPower6.0—ECS电厂电气监控系统既适用于主控室形式的中小型火力发电厂,也适用于单元集中控制方式的大中型火力发电厂,可根据控制方式灵活配置。
主要实现的功能具体在以下几个方面:1.1.1对厂用电系统,能按启动/停止阶段和正常运行阶段的要求实行程序控制或软手操控制,实现由工作到备用或由备用到工作电源的程序切换或软手操切换,保证机组的安全运行和正常起机/停机。
1.1.2 对发变组,实现发变组系统自动程序控制或软手操控制,可使发电机由零起升速、升压直至同期并网带初始负荷的程序控制和软手操控制,或使发电机自动停机。
发电机励磁系统电压调节、发变组同期、电气设备保护、6kV 厂用电快切功能由独立的装置实现,DCS 控制自动装置的起停和方式选择,并进行状态监视。
根据实际运行水平和设备可靠性,机组程控并网可设置人工间断点,分布进行。
1.1.3 实时显示并记录(其中包括事故顺序记录SOE)发电机、变压器(或发变组)系统、厂用电系统、网控系统和电气专用自动装置的正常运行、异常运行和事故状态下的各种数据和状态,自动生产数据报表、操作记录报表,通过对故障进行详尽的分析,迅速得出事故原因,并提供操作指导和应急处理措施。
Power6服务器技术要点

Workload Accelerators Highly Threaded Cores
Very High Frequencies 4-5 GHz Enhanced Virtualization Advanced Memory Subsystem AltiVec™ Vector SIMD Instructions Instruction Retry Decimal Floating-Point Dynamic Energy Management Partition Mobility Storage Protection Keys Alternate processor recovery
4 © 2008 IBM Corporation
POWER Technology
2001 POWER4 2004 POWER5 2007 POWER6
4-5 GHz 2 Cores
POWER7*
Advanced Core Design
Advanced System Features
1.5+ GHz 1.5+ GHz Core 1+ GHz Core 1+ GHz Core Core
Java BigDecimal w/ DEC number % execution time in decimal operations Speedup with hardware DFP* 93.2% < 7X
C, C# packages 72 – 78% 4X
Integer hand-tuned 45% 2X
部门级
p5-560Q
P6-570 P6-575
P6-595
入门级
p5-550/550Q
IBM Power6系列570小型机产品介绍

模块化的、可扩展的系统,帮助企业提高业务灵活性和 IT 效率IBM Power 570 服务器对大中型交易处理工作负载来说,IBM Power™ 570 服务器能够提供卓越的性能、主机级可靠性、不中断业务运行的模块化增长及创新的虚拟化技术。
这些特性集成在一起,可以帮助企业简化增长、复杂性和风险管理工作。
对于大中型数据库服务来说,Power 570专门设计用于满足最苛刻的关键后端工作负载的要求。
经证实,570服务器能够跨越多个数据库解决方案和操作系统提供卓越性能,是帮助公司真正迎接最重要的 IT 资产(即数据库)挑战的重要工具。
对于服务器合并,Power 570 提供巨大的灵活性,允许企业使用同一个系统同时运行AIX®、IBM i 、Linux for Power 和 x86 Linux 应用的任意组合。
此外,PowerVM ™ 虚拟化技术允许企业跨越上述所有环境动态调配资源,以便优化性能和效率,同时最大限度地减少能源使用量。
通过 Power 570,您可以轻松控制整个业务环境。
在满足整个业务系统的需求方面,Power 570通过独特的方式将多个工作负载的性能和可用性特性结合在一起,以便保持您的业务正常运行。
此外,PowerVM 虚拟化技术能够帮助您最大限度地提高运行效率,不中断业务运行的增长选项设计用于帮助您控制业务成本。
570服务器在一个集成的节能产品中提供上述全部特性,无疑是一个极佳的业务解决方案。
570服务器在最受欢迎的最新版本中最多可支持 32 个POWER6™ 处理器内核,每个构建块支持的处理器内核数量是上一个版本的两部,但仍然沿用了模块化设计,从而提高了每个系统的性能和空间利用率,最重要的是提提高了每瓦性能比。
这个版本是 570 产品家族中能效最高的的产品,适用于高度重视总体系统吞吐量的多个应用环境。
IBM Power 570 是总共包括四个组件的模块化服务器,采用可安装在机柜中的配置。
Power服务器特性解析

Power服务器特性解析:▪Power Systems 技术优势及产品更新▪Power Systems 性能表现▪Power Systems RAS特性▪Power Systems 动态基础架构▪Power Systems 综合优势总结一、Power Systems 技术优势及产品更新一)、整体技术优势:1、更新的技术:CPU不需要共享二级缓存,每CORE单独拥有二级缓存2、技术优势:优化的整体架构•第五代多核设计,主频高达5GHz主频, 2.5GHz interconnect•相对Power5,每个CPU芯片内L2 Cache 大小增加4倍•更高的内存和IO带宽,适用大数据吞吐量的业务应用•内置硬盘改为SAS硬盘, 增加PCI-E插槽,增加新型IO Drawer 性能优异, 同时支持节能减排•支持绿色IT,多项节能减排特性: Processor Nap Mode,Memory Power Down support,Power Save Mode•性能优异,囊括几乎全部UNIX基准测试桂冠增强的虚拟化特性•Live Partition Mobility: 跨系统实时分区迁移•配合AIX v6.1,增加WPAR功能,资源调度更加灵活•Multiple Shared Processor Pool, Active Memory Sharing, IVE, NPIV 支持多种操作系统•支持AIX 6.1,IBM i和Linux操作系统•保证二进位及软件兼容增强的RAS特性•Processor Instruction Retry, Alternate Process Recovery•Workload Mobility•Hot Node Add / Hot Node Repair3、Performance 革新-更少处理器, 提供更多的性能▪世界最高主频芯片: 最高5.0 GHz提供最佳单线程处理能力, 比友商高一倍以上多达8个执行单元(execution units), 提供超线性能力, 包括2LS, 2FP, 2FX, 1VMX, 1DP独有十进位浮点加速器(Decimal FP Unit) 世界第一第二代并发多线程技术(Enhanced SMT), 增加应用性能341 mm2芯片集成7亿9千万个晶体高密度设计8MB二级缓存和32MB三级缓存, 比友商多,特别适合数据密集,高性能计算等应用第六代双核芯片设计, 性能优化, 稳定可靠芯片制造技术: CMOS 65nm lithography, 铜芯片和绝缘硅技术, 为小型机最先进芯片制造技术高速处理器互连技术: 速度高达处理器主频一半, 业界领先, 特别适合I/O密集应用,如: 数据库主机级可靠性功能, 如: 可恢复单元(RU), 新增16组Memory key, 双系统时钟* ,在线升级*和在线维修*等领先电源管理, 如:节电, 限电, 降频,休眠等功能二)、产品更新1、海量服务器2009产品更新举例:IBM海量服务器典型应用2、IBM Power6 刀片服务器家族3、重点推荐:1)、IBM Power 560 特点:Power=性能+价格比基本参数、与SUN服务器的对比最高支持16路3.6GHz的Power 6芯片,最大支持384GB,是一款性价比优良服务器产品。
POWER6-最新技术及产品展望

POWER6-最新技术及产品展望议程•IBM领先科技与未来•随需应变,创新为要,成就客户•System p产品系列与相关更新•总结Only UNIX platform to grow over last five yearsFive-year revenue share change-7-5-3-1135791113HPSun+10.4%-5.3%-1.4%Source: IDC Server Tracker Q406 and FY06 Server Tracker, 02/24/07, rolling four quarter averageIBM: 市场领导者保持性价比领先地位技术研发的巨大投入依托主机设计经验MicroR eliability A vailability S erviceability0000000000000000000000000000p5-570p5-570+HP rx8620p5-590p5-595HP SDFujitsu P 2500500100015002000250030003500P a t e n t s A w a r d e dIB MHP IntelMatsu shita Samsung Hita chi ToshibaFujitsuMicr on Can on美国国家技术勋章多核处理器动态随机存储存取器(DRAM)铜芯片电路绝缘硅(SOI)高速锗化硅芯片(SiGe).第三方观点: Gartner Magic Quadrant 2006Ten years ago…Deep Blue changed the world’s perception of what a computer can doMay 11, 1997Equitable CenterNew York CityAll results current as of 5/21/07. Source: /list/1997/11/300 , IBM DEEP BLUE(R) 1.2 GHz, 32 NODE SP2 P2SC, Rpeak: 15.36 GFLOPS, Rmax: 11.38 GFLOPS. Source: /benchmark/performance.pdf , IBM System p 570, 4.7 GHz POWER6, 1 core, Rpeak: 18.8 GFLOPS, Rmax: 15.53 GFLOPSIBM POWER technology: 10 years of innovationEach POWER6 core exceeds the performance of Deep BlueDeep Blue1.4 tons / 1,270 kgPOWER6MicroprocessorD e e p B l u eP O W E R 6 C O R EGFLOPS11.38P O W E R 6 C O R E15.5315.53POWER DesignPOWER6 Characteristics4.7 / 4.2 / 3.5 GHz>790M transistors65nm•Ultra-high frequency dual-core chip: 4.7GHz –7-way superscalar, 2-way SMT core –Eight execution units•2LS, 2FP, 2FX, 1VMX, 1DP –790M transistors, 341 mm 2die–2x4MB on-chip L2 –point of coherency –On-chip L3 directory and controller •Technology–CMOS 65nm lithography, SOI Cu•High-speed elastic bus interface at 2:1 freq •Full error checking and recovery •Dynamic power saving –Advanced clock gatingMemory+GX+ BridgeGX Bus CntrlMemory ControllerFabric Bus ControllerPOWER6Core Alti VecL3CtrlL3POWER6CoreAlti Vec 4 MB L24 MB L2Power6 处理器架构FX0FX1LSO LS1FP0FP1Thread1 activeThread0 activeNo thread activeUtilizes unused execution unit cyclesReuse of existing transistors vs. Performance from additional transistors Presents symmetric multiprocessing (SMP) programming model to softwareDispatch two threads per processor: “It’s like doubling the number of processors.”Net result:–Better performance–Better processor utilizationAppears as four CPUsper chip to the operating system (AIX 5L™V5.3 andLinux®)S y s t e m t h r o u g h p u tPOWER5SMT STPOWER6 Enhanced Simultaneous Multithreading POWER6SMTPOWER5 Simultaneous MultithreadingPOWER6 Delivers Improved System Utilization Through Enhanced Simultaneous MultithreadingDPVMX2004-62007-9POWER5 / 5+2010-11POWER6 / 6+POWER4 / 4+Distributed SwitchShared L21+ GHz Core1+ GHz Core2001-41.65+GHz CoreDistributed SwitchShared L21.5+GHz Core Shared L21.9GHz Core Distributed Switch 1.9GHz Core CacheAdvanced hybrid Core DesignAdvancedSystem FeaturesPOWER71.5+ GHz Core Distributed SwitchShared L21.5+ GHz Core2.3 GHz POWER5+Enhanced ScalingSimultaneous Multi-Threading (SMT)Enhanced Distributed Switch Enhanced Core Parallelism Improved FP Performance Increased memory bandwidth Reduced memory latencies VirtualizationVery High Frequencies 4-5GHz Enhanced VirtualizationAdvanced Memory Subsystem Altivec Vector SIMD instructions Instruction RetryDecimal Floating Point Dynamic Energy ManagementPartition MobilityMemory Protection KeysChip Multi Processing -Distributed Switch -Shared L2Dynamic LPARs (32)Workload AcceleratorsHighly threaded coresL2 CacheAdvanced System Features 4-5 GHz2 Cores V M XL2 CacheHigh Freq Multi-CoreAdvanced System Features V M XBINARY COMPATIBILITYPerformance1x 1.3x2x4x10xRelative to POWER5IBM POWER芯片发展计划Just won $244M DARPA contractIT / IT : IT : IT : IBM pSeries HMC IBM pSeries / LPAR#1 : AIX5.22P Mroot shutdownLPAR#2 : AIX5.2-20.5P MMroot backuphscrootLPAR#3 : AIX5.31.5P MMroot usradmIBM DS8000LPAR LPAR#N : Linuxroot2.5PMMPOWER6 virtualisation innovation Increased optimisation in the datacentreNon-disruptively move of partitions/applications across physical servers. Memory and I/O state is preserved.Server 1Server 2PPaarrttiittiioonn11 PPaarrttiittiioonn11Hypervisor HHaarrddwwaarreeHypervisor HHaarrddwwaarreeServer 1App1App2OO/SS Hypervisor HHaarrddwwaarreeServer 2App2 OO/SSHypervisor HHaarrddwwaarreeShared Server ResourcesR(eRleoloccaattinnggpparatirttiointiso) nsShared Server ResourcesRel(oRcelaotciantigngaapppplliiccaatiotinos)nsFrom single-server to multi-server virtualisationIBM()IBM39197319871997200120042007IBM IBM LPAR POWER LPAR LPAR POWER5 POWER4 UNIX Partition Mobility Workload Partition IBMTCO69% - 76% System p565% - 69%CPU 31% - 45%IT 52% - 61% IBM System p product lineLinuxHE2007522 MRp5-595p5-560p5-590QCMp570LEp5-570p5-575SLEp5-520+/520Qp5-550+/550QMdl 285+p5-510+/510Q p5-505/505QBlad JS21 esPOWER5+SystemsPOWER6SystemsSystem p System p5 590 System p5 595BI, ERP, CRM, /System p5 570 System p6 570 System p5 590System p5 595 Web server BladeCenter JS21System p5 560QSystem p5 550+/550QRelative Performance Per CoreThe IBM POWER6 "Grand Slam" for major workloadsRelative Performance Per Core33X TransactionTPC-C**2Systems with 16 or more coresPOWER6 Itanium2 XEON10IBMHPSun Unisys Ep570 SuperdomeS700032.3X ThroughputSPECint_rate2006*2Systems with 8 or more cores22X Java1.5SPECjbb2005*Systems with 16 or more cores1POWER6 Itanium2SPARC64XEON Opteron0.50IBM Fujitsu Sun Fujitsu Fabric7 p570 PQ 580 M8000 RX800 Q8032.3X HPCSPECfp_rate2006*2Systems with 8 or more coresRelative Performance Per CoreRelative Performance Per CorePOWER6 Itanium2SPARC64XEON OpteronPOWER6 Itanium2SPARC64XEON Opteron110IBM HPSun Fujitsu HPp570 rx6600 M8000 RX300 DL5850IBMHPSun Fujitsu HPp570 rx 7640 M8000 RX300 DL585* Source: / IBM p570 POWER6 results to be submitted on 5/21/07: All other results as of 04/27/07; ** Source: / IBM p570 POWER6 result to be submitted on5/21/07; All other results as of 04/27/07 See next page for full detailIBM System p : IBM System p590 MTBF (UIRA): 27.290 Years Availability (UIRA): 0.999987HACMPIBM System p590 with HACMP Availability of minimum configuration is 99.99999%. Unavailability is 1.900E-09. MTBF = 2121268000 hours.?1= 52560099.999% = 5.25699.99% = 52.5699.95% = 262.899.9% = 525.6POWER POWER6。
CPU片上缓存技术解析

CPU片上缓存技术解析CPU片上高速缓存是位于CPU与内存之间的临时存储器,它的容量比内存小但交换速度快。
在缓存中的数据是内存中的一小部分,但这一小部分是短时间内CPU即将访问的,当CPU调用大量数据时,就可避开内存直接从缓存中调用,从而加快读取速度。
众所周知,随着CPU 内核的尺寸与带宽技术的进步,很快就会造成投资回报的递减。
因此,当核心的尺寸工艺收缩到一个小尺度上后,出于成本方面的考虑,芯片制造商通常的选择有如下三种:生产更小的芯片、增加大量缓存、增加更多内核。
选择更大的片上缓存则是其中性价比较高的一种选择,这也是为何从1990年代以来,从RISC处理器到x86处理器的制造商均不停地增加CPU片上缓存的原因。
缓存对CPU的性能影响很大,主要是因为CPU的数据交换顺序和CPU与缓存间的带宽引起的。
缓存的工作原理是当CPU要读取一个数据时,首先从缓存中查找,如果找到就立即读取并送给CPU处理;如果没有找到,就用相对慢的速度从内存中读取并送给 CPU处理,同时把这个数据所在的数据块调入缓存,以后对整块数据的读取都从缓存中进行,不必再调用内存。
正是这样的读取机制使CPU读取缓存的命中率非常高(现在大多数CPU可达90%以上),只有5%左右需要从内存读取。
这大大节省了CPU直接读取内存的时间,也使CPU读取数据时基本无需等待。
总的来说,CPU读取数据的顺序是先缓存后内存。
在大多数情况下,x86架构的多核CPU的内核拥有独立的L1缓存,共享L2缓存、内存子系统、中断子系统和外设。
因此,系统设计师需要让每个内核独立访问某种资源,并确保资源不会被其他内核上的应用程序争抢。
在上世纪90年代,x86架构下的CPU大多只有4KB到32KB的一级片上缓存,128KB到256KB的二级缓存。
直到1998年奔腾2至强400的出现,才实现了x86架构上的512KB二级缓存(更晚才出现1MB的二级缓存)——作为对比,1997年HP推出的PA-8500的二级缓存则是1.5MB,且其他工艺和参数指标均远超Xeon。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
POWER6 Technology OverviewIBM POWER™ SYSTEMS 经过近20年POWER处理器技术持续不断的发展和进步 研发第六代POWER处理器 – POWER6 第五颗双核(Dual Core)芯片 IBM在过去十多年一直遵守承诺, 按时发布 POWER2007 2006 2004 2003 2001POWER4™ POWER4+™ POWER5™ POWER6™ POWER5+™Processor History 2001 - 2007Dual-Core 1 GHz Distribute Switch Multithreading Memory Controller on Chip Enhanced Multithreading Memory Controller on Chip >4 GHzPOWER4 414 mm2 1.1 – 1.3 GHzPOWER4+™ 267 mm2 1.5 – 1.9 GHzPOWER5 389 mm2 1.65 – 1.9 GHzPOWER5+™ 245 mm2 1.9 – 2.3 GHzPOWER6 341 mm2 3.5 – 4.7 GHz20012002200320042005200620072008IBM POWER处理器, 开创行业新标准1997 1999 2001 2001 2001 2001 2002 2004 2004 2005 2006 2007 2007 全世界第一枚处理器采用铜芯片(Copper)芯片制造技术 全世界第一枚处理器采用绝缘硅(SOI)芯片制造技术 全世界第一枚处理器采用双核(Dual Core)技术 全世界第一枚RISC处理器采用8核多芯片模块(MCM)技术 全世界第一枚RISC处理器超越 1GHz 主频 全世界第一枚RISC处理器支持逻辑分区(LPAR)技术 全世界第一枚RISC处理器支持动态逻辑分区(DLPAR)技术 全世界第一枚RISC处理器支持微分区(Mirco-Partition)技术 全世界第一枚RISC处理器支持并发多线程(SMT)技术 全世界第一枚处理器广泛应用于主要电视游戏机平台 全世界第一枚处理器采用Dual Stress芯片制造技术 全世界第一个厂商发布High-K(Hafnium)Metal Gate半导体技术 全世界第一枚处理器超过4GHz主频SiO2SOIPOWER 系统的性能价格比发展趋势图tpmC 3500000 3000000 $/tpmC$109.003,210,541 $120$100$89.002500000 2000000Cost$52.70$80Performance$43.00 1,025,486$601500000 1000000$40763,898 $17.80 403,255 220,807 $8.31$2050000018,6660 S70199734,139S7A1998135,815$5.42$5.19$0S801999S852000p6902001-2002p690+2002 2003p690++2004p5-5952005IBM POWER 芯片架构历代演变POWER4™POWER4 Core POWER4 CorePOWER5™POWER5 Core POWER5 CorePOWER6Alti POWER6 Vec CorePOWER6 Alti Core VecL2L3L3 CtlL2 L3L3 Ctrl4 MB L24 MB L2Distributed Switch L3 CntrlEnhanced Distributed SwitchMemory CntrlFabric Bus ControllerMemory Controller GX Bus CntrlL3 GX Bus Mem Ctl Memory Memory GX BusGX+ BridgeMemory+IBM 将于 2007 推出新一代 POWER 处理器技术 2007年5月22日 IBM POWER5+ 处理器技术仍然在 UNIX 市场领先 隆重推出 Systemp5 590/595First POWER6 16-way p570 systemNEW! The absolute best midrange performance in the industry! Compatible with existing software applications Systemp5 560Q Express System p5 550 & 550Q System p5 520 Express & 520Q ExpressSystem p5 570POWER6 p570System p5 510 & 510Q Express System p5 505Q ExpressSystem p5 575POWER6IBM BladeCenter™IBM IntelliStation™ POWER 185 and 285 ExpressPOWER5 / 5+ 设计回顾POWER5+ design 1.65, 1.9 ,2.2, 2.3GHz POWER5 design1.5, 1.65 and 1.9 GHz .09 micron .13 micronPOWER5 / 5+POWER5/5+P L3L3 Ctl并发多线程 SMT 微分区技术Sub-processor allocationP L2P L2PL3 CtlL3增强的分布式开关 增强的内存子系统Larger L3 cache: 36MB Memory controller on-chipDist. Switch Fab CtlMem CtlDist. Switch Fab CtlMem Ctl密集型计算能力进一步提高 动态节能模式Clock gatingGX BusGX BusMemoryMemoryPOWER5 / 5+的三种封装技术PP L2 PP L2 PP L2L3L3L3双核模块 (DCM) POWER5+ 双核芯片 + L3 cache chip (505,510,520, 550,570)四核模块 (QCM)两路 POWER5+ 双核芯片 + 双 L3 缓存芯片 (510Q,520Q,550Q,560Q)PP L2PP L2PP L2PP L2L3L3L3L3多核模块 (MCM)四个双核 POWER5 芯片 + 四个 L3 缓存芯片 - 8处理器核心 (590, 595)并发多线程技术(SMT,逻辑CPU)POWER4 (Single Threaded) POWER5 (simultaneous multi-threading)FX0 FX1 LSO LS1 FP0 FP1 BRZ CRL时钟周期 Logical CPU0 Logical CPU1Thread0 active No thread active Thread1 activeThread0 purr0Thread1 purr1Physical CPU timebase register 提高CPU执行单元的利用率 Presents symmetric multiprocessing (SMP) programming model to software 自动适应超标量执行指令无序性 每个处理器调度两个线程:“对应用和操作系统而言,CPU数量扩展了两倍.” 支持并发多线程(SMT)操作模式,在相同主频下,针对高吞吐量类型的商业数据库应用、 Web应用、Java应用等,POWER5服务器平均比以前的服务器提升15-40%的性能。
SMT 测试案例:SMT 技术使Oracle 9i 数据库处理能力提升42%!!!CPU Total eval-dom1 3/9/200410203040506070809010012:3412:3612:3812:4012:4212:4412:4612:4812:5012:5212:5412:5612:5813:0013:0213:0413:0613:0813:1013:1213:1413:1613:1813:2013:2213:2413:2613:2813:3013:3213:3413:3613:3813:4013:4213:44User%Sys%Wait%CPU Total eval-dom1 10/9/2004010203040506070809010014:5614:5714:5814:5915:0015:0115:0215:0315:0415:0515:0615:0715:0815:0915:1015:1115:1215:1315:1415:1515:1615:1715:1815:1915:2015:2115:2215:2315:2415:2515:2615:27User%Sys%Wait%CPU Total eval-dom1 10/9/200410203040506070809010010:4510:4710:4910:5110:5310:5510:5710:5911:0111:0311:0511:0711:0911:1111:1311:1511:1711:1911:2111:2311:2511:2711:2911:3111:3311:3511:3711:3911:4111:4311:4511:4711:49User%Sys%Wait%AIX 5.2 -8-way, 64 GB p570700 Users180 Transactions per Second< 0.05 sec response timeAIX 5.3 (SMT) –cached (low wait)1000 Users280 Transactions per Second< 0.05 sec response timeAIX 5.3 (SMT)700 Users~180 Transactions per Second< 0.05 sec response timeGartner Magic Quadrant 市场分析HP IntegrityChallengersLeadersNiche Players VisionariesCompleteness of VisionA b i l i t y t o E x e c u t eStratus ftServerIBM System p5Unisys ES7000HP 9000SUN Fire U/SPARC IV+IBM System zBull NovaScale HP Integrity NonStop Fujitsu PrimequestFujitsu PrimergySun Fire T2000Sun Fire x86HP BladeSystemIBM BladeCenter Dell PowerEdge IBM System xIBM System i HP ProLiantFujitsu Primepower Gartner, Magic Quadrant For Enterprise Servers 2006, August 10, 2006.; Philip Dawson, Jonathon Hardcastle, Andrew Butler, Donald Feinberg, PaulMcGuckin.ID Number: G0*******The Magic Quadrant is copyrighted 2006 by Gartner, Inc. and is reused with permission, which permission should not be deemed to be an endorsement of any company or product depicted in the quadrant. The Magic Quadrant is Gartner, Inc.'s opinion and is an analytical representation of a marketplace at and for a specific time period. It measures vendors against Gartner defined criteria for a marketplace. The positioning of vendors within a Magic Quadrant is based on the complex interplay of many factors. Gartner does not advise enterprises to select only those firms in the "Leaders" quadrant. In some situations, firms in the Visionary, Challenger, or Niche Player quadrants may be the right matches for an enterprise's requirements. Well-informed vendor selection decisions should rely on more than a Magic Quadrant. Gartner research is intended to be one of many information sources including other published information and direct analyst interaction. Gartner, Inc. expressly disclaims all warranties, express or implied, of fitness of this research for a particular purpose.Leverage our Leadership for the price premium it deserves处理器研发所遇到的瓶颈问题-Reduce transistor power-Reduce switching per function -More slower threads -Specialized function -More slower threads -Asynchronous loads1.Power Wall (耗电量的障碍)2.Frequency Wall (时钟主频的障碍)3.Memory Wall (内存吞吐量的障碍)Increase Concurrency(增加并发能力)Increase Specialization(增加专门功能)创新的芯片制造技术:Dual Stress & HIGH-K Metal Gate (Hafnium)What is Dual Stress Technology ?Technology incorporated into POWER5+ processor Strained Silicon on Silicon-on-Insulator technology Stretches & compresses transistors Stress film technologyProvides more efficient flow of electrons No special materials requiredBenefit :Increase transistor speeds by up to 20% without increasing power consumption Reduce electric current leaksGreater performance without increasing power & heatPOWER5+Stress film layerWhat is High-K Metal Gate Technology ?January 27, 2007, IBM and Intel had developed a new “high-k”material, based on the element Hafnium , that could be substituted for the traditional gate oxide used in the transistors that make up microchips using 45nm technologyPOWER DesignPOWER6 Characteristics4.7 / 4.2 / 3.5 GHz>750M transistors65nmPOWER6 处理器架构Ultra-high frequency dual-core chip: 4.7GHz 7-way superscalar, 2-way SMT core Eight execution units2LS, 2FP, 2FX, 1VMX, 1DP 790M transistors, 341 mm 2die2x4MB on-chip L2 –point of coherency On-chip L3 directory and controller TechnologyCMOS 65nm lithography, SOI CuHigh-speed elastic bus interface at 2:1 freq Full error checking and recovery Dynamic power savingAdvanced clock gatingMemory+GX+ BridgeGX Bus CntrlMemory ControllerFabric Bus ControllerPOWER6Core Alti VecL3CtrlL3POWER6CoreAlti Vec 4 MB L24 MB L2POWER6 芯片架构Dual Core chip2-way SMT coreUltra-high frequency4.7 / 4.2 / 3.5 GHz>750M transistorsSuperscalarLarge on-chip L2On-chip L3 directory & controller Two memory controllers on-chip TechnologyCMOS 65nm lithography, SOI Error checking and recoveryIFUSDUFXURU AltiVecFPUDFUL2Cache L2CacheL2Cache L2CacheLSUFPULSUIFUSDU RUFXUDFUAltiVecPOWER5+ 与POWER6 处理器之比较64 KB, 4-way 64 KB, 2-way ICache capacity, associativity L1 Cache64 KB, 8-way32 KB, 4-wayDCache capacity, associativity 8-way, LRU10-way, LRUAssociativity, replacement 2 x 4 MB, 128 B line1.9 MB, 128 B line Capacity, line size 8 TB maximum 4x DRAM frequency4 TB maximum 2x DRAM frequencyMemory Memory bus16-way, LRU 12-way, LRU Associativity, replacement 32 MB, 128 B line 36 MB, 256 B line Capacity, line size Off-chip L3 Cache L2 CachePOWER6POWER5+POWER5+ 与POWER6 处理器之比较Mostly in-order with special case out-of-order executionGeneral out-of-order executionStyle2FX, 2LS, 2FP, 1VMX, 1DP2FX, 2LS, 2FP, 1BR, 1CR UnitsTwo SMT threads Priority-based dispatchSimultaneous dispatch from two threads (up to seven instructions)Two SMT threads Alternate ifetch Alternate dispatch (up to five instructions)ThreadingPOWER6POWER5+POWER6* CorePOWER6 processor is ~2X frequency of POWER5 (4-5GHz)POWER6 instruction pipeline depth equivalent to POWER5Minimize powerScale performance with frequencyInstruction FetchInstruction Buffer/DecodeInstruction Dispatch/IssueData Fetch/ExecuteFXU Dependent execution Load Dependent executionPOWER6 extends functionality of POWER5 Core64K I Cache, 64K D Cache, 2 FXU, 2 FPU, 1 Branch execution unit Two way SMT with 7 instruction dispatch from 2 threads (maximum of 5 instructions per thread) Decimal Unit VMX UnitRecovery Unit~6ns/instr~3ns/instrFX0FX1LSO LS1FP0FP1Thread1 activeThread0 activeNo thread activeUtilizes unused execution unit cyclesReuse of existing transistors vs. Performance from additional transistors Presents symmetric multiprocessing (SMP) programming model to softwareDispatch two threads per processor: “It’s like doubling the number of processors.”Net result:–Better performance–Better processor utilizationAppears as four CPUsper chip to the operating system (AIX 5L™V5.3 andLinux®)S y s t e m t h r o u g h p u tPOWER5SMT STPOWER6 Enhanced Simultaneous Multithreading POWER6SMTPOWER5 Simultaneous MultithreadingPOWER6Delivers Improved System UtilizationThrough Enhanced Simultaneous MultithreadingDPVMXPOWER6* scales chip capabilities with core performanceCache highlights 4MB Private L2 Cache per Core 32MB Non-sectored L3 Cache per chip Fabric highlights Three Intra-Node SMP buses for 8-way Node Two Inter-Node SMP buses for up to 8 Nodes Multiplexed Address/Data SMP buses New prefetching capabilities Coherent Multi-Cacheline Data Prefetch Operations Prefetching on stores80GB/sCoronaCore(SMT+, VMX)2X2B A 2X8B D 2X8B DCore(SMT+, VMX)L3 Cache 32MBL3 Dir L3 Cntl Mem Cntl Mem CntlL2 Cache (8.0MB) GX+ Cntl FabricA/D A/D20GB/s4B 4B4X1B 4X1B 4X2B 4X1B 4X1B 4X2BA D D A D DGX+ BusA/D A/D A/D8B 8B75 GB/s Total = 300 GB/sA/D8B 8BA/D A/DA/D A/DA/D A/D 8B 8B8B 8B8B 8BIntra-Node 8W SMP Buses50GB/s80GB/sFlex System designed to help optimize low-end to high-end server designsSMP busses can be configured in two modes Cost/performance trade-offs On node busses are 8B or 2B Off node busses are 8B or 4B Numerous memory controller BW options 1 or 2 memory controllers are available Memory controllers can be configured to full width or ½ width L3 cache is supported in three configurations On module High Bandwidth configuration Optional off module configuration No L3 option Fully interconnected two-tier SMP fabric Reduced latencies vs. POWER5 New two tier memory coherency protocol2 socket4 socket32 socketPOWER5 and POWER6 Nodal TopologyPOWER5 and POWER6 System TopologyIBM p6 p570 Modular Architecture enhanced with new flex cables¾ POWER6 fabric bus connects processors together in separate drawers using a new SMP Flex cable ¾ Flex cable attaches directly to the CPU cards at the front of the drawer. ¾ To grow from 2 to 3 to 4 drawers only requires cable additions ---no “parts on the floor.”Designed for bullet-proof computingRecovery CapabilityArray error Error correction (ECC) Arrays with parity Processor restarts Instruction flow and Data flow Error Processor restarts Control Error Processor restartsCoreInstruction Fetch DecodeExecution UnitsCore error collectionLoad/ StoreRecovery unit Core restartSystem Resiliency Processor states are check pointed and protected with ECC Processor states can be moved from one processor to another upon unsuccessful recovery restartDesigned for bullet-proof computingSystem RAS with recovery unit Extensive measures taken to preserve application execution Retry soft errors Change hardware for hard errorsProcessor architected state check pointed Every 1 cycle ECC & Non-ECC protected circuitry checked Every cycleNo error found Error foundProcessor restarts from last saved checkpointError foundNo error foundSoft error caseProcessor workload moved to another CPUHard error caseDecimal Floating-pointDescription: New / more precise mechanism for Decimal Arithmetic Markets: DB servers, Financial Sector Applications, etc.Binary Floating-point problem….Add 5% sales tax to a $ 0.70 telephone call, rounded to the nearest cent 1.05 x 0.70 using binary double is exactly 0.73499999999999998667732370449812151491641998291015625 (should have been 0.735) rounds to $ 0.73, instead of $ 0.74Decimal Floating-point benefit…. True decimal processing; .735 BenefitsPerformance and AccuracyRequirement:AIX 5.3 TL06Decimal Floating Point AcceleratorsBinary floating-point unsuitable for commercial or human-centric applications Survey of numeric data in commercial databases is largely decimal data55% of numeric data in databases is BCD data The next 43% of data is integers, often held as decimal integersExample: performance improvement of decimal hardware vs. decimal softwareTelco billing application -- 1 million calls (2 minutes) read from file, priced, taxed, and printed:Java BigDecimal w/ DEC number % execution time in decimal operations Speedup with hardware DFP** IBM projectionC, C# packages 72 – 78% 4XInteger hand-tuned 45% 2X93.2% < 7XStandards & software rallying around DFP standardizationNumerous software and standards activities in flight inside and outside IBM Java BigDecimal (compatible with 754r) C# and .Net ECMA and ISO standards arithmetic changed to match 745r decimal128 XML Schema 1.1 draft now has pDecimal compatible with 754r ISO C and C++ are jointly adding decimal floating-point as first-class primitive types GCC almost complete Cobal is adding a datatype to support 754r ANSI/ISO SQL … new types accepted in principle (draft about to be submitted) Strong support expressed by Microsoft, SHARE, academia, SAP and many othersPOWER6 AltiVec Vector TechnologyDramatic application performance gains SIMD (Single Instruction, Multiple Data) Extension to PowerPC Architecture™, jointly developed by Apple, Motorola, IBM Targets High Performance Computing and Deep Computing applicationsBenefit to ISVs / clients: Provides highly parallel operations Dramatically better performance for highly “vectorized” codeDevelopment / test environment: Current support: IBM BladeCenter® JS21 or IBM IntelliStation® POWER™ 185 Express Supported by AIX® and Linux releases IBM XL C/C++ Enterprise Edition V8.0 for AIX (October 2005) provides Support for the AltiVec instruction set Support for the AltiVec programming model and APIs IBM XL Fortran Enterprise Edition V10.1 for AIX (October 2005) can Automatically enable SIMD vectorization at higher levels of optimization Additional compiler support for AltiVec™ vectorization extensions will be available in XL C/C++ V9.0, with Automatic SIMD vectorization Redbook: /abstracts/redp3890.htmlPOWER6 System EnhancementsPowerExecutive Extensions for POWER6 Energy Management PoliciesExample Energy Management Policies: Energy cost management– Monitor System workloads/power consumption • If: System utilization reduces reduce system power/performance • If: Multiple Systems go below utilization threshold consolidate workloads • If: System power budgets exceed allocation cap powerPowerExecutive Acoustic enhancement– Monitor Systems temperature • If system temperatures go below threshold reduce fan speeds Performance enhancement– Monitor system temperature/power consumption • If temperatures/power consumption go below threshold increase performanceEnergy management policies designed to enable clients to maximize the compute capability of their data center or minimize energy costsEMPATH System Control planned for POWER6* Extended System Functions For PowerExecutive PoliciesThermal / Power MeasurementHardware Management ModuleIBM Director PowerExecutiveRead thermal data from processor chip thermal sensors Measure power data from system level sensors Report data via PowerExecutivePower CappingIBM POWER6 Service ProcessorPower Control Firmware AME API Power Mgmt Policies EMPATH ControllerUse of Hardware controls to keep system power under a specified limitPower SavingOperation at reduced power when workload and policy allows Can be a static policy (e.g. overnight reduction) Can be dynamic (when absolute max performance is not always required)System health monitoringPower ModulesPower Measurement CircuitsUse of hardware sensors to help ensure system is operating within safe predefined boundsPerformance-Aware Power ManagementUse of dedicated performance counters to guide power and thermal management tradeoffsSummary and conclusionsPOWER6* doubles frequency and bandwidth of POWER5 Same pipe depth Same power envelope POWER6* scales chip/system performance with core performance POWER6* provides new capabilities Decimal Floating Point Processor recovery System p will begin delivery of system power management with POWER6* POWER6* is on track to deliver high frequency capabilities。