A Parallel Scheduler for Block Iterative Solvers in Heterogeneous Computing Environments

合集下载

基于申威众核处理器的海冰模式并行加速方法

２２２２
计算机学报
２０１８年
ｃｏｍｍｕｎｉｃａｔｉｏｎｓ，ＧＰＵｂａｓｅｄａｃｃｅｌｅｒａｔｉｏｎｃａｎｇｅｎｅｒａｌｌｙａｃｈｉｅｖｅａｓｐｅｅｄｕｐｏｆｏｎｅｏｒｄｅｒｏｆｍａｇｎｉｔｕｄｅ．Ｄｉｆｆｅｒｅｎｔｆｒｏｍｔｈｅｐｒｏｊｅｃｔｓｍｅｎｔｉｏｎｅｄａｂｏｖｅ，ｏｕｒｗｏｒｋｆｏｃｕｓｅｓｏｎＣＩＣＥ５．１ａｎｄｕｓｅｓＳｕｎｗａｙｍａｎｙｃｏｒｅｐｒｏｃｅｓｓｏｒａｓｏｕｒｈａｒｄｗａｒｅｐｌａｔｆｏｒｍ．ＳｉｎｃｅＣＩＣＥｓｅａｉｃｅｍｏｄｅｌｈａｓｍａｎｙｒｅｌａｔｉｏｎｓｈｉｐｓｗｉｔｈｏｔｈｅｒｃｌｉｍａｔｅｃｏｍｐｏｎｅｎｔｓ，ｏｐｔｉｍｉｚｉｎｇｃｏｍｍｕｎｉｃａｔｉｏｎｓｂｅｔｗｅｅｎＳｕｎｗａｙｍａｎｙｃｏｒｅｐｒｏｃｅｓｓｏｒｓｂｅｃｏｍｅｓａｎｉｍｐｏｒｔａｎｔｉｓｓｕｅｗｅｎｅｅｄｔｏａｄｄｒｅｓｓ．Ｂｅｓｉｄｅｓ，ｔｕｎｉｎｇｔｈｅｏｒｉｇｉｎａｌＣＩＣＥａｌｇｏｒｉｔｈｍｓｔｏｆｉｔｔｈｅｎｅｗｃａｌｃｕｌａｔｉｏｎｅｌｅｍｅｎｔｓｉｓａｎｏｔｈｅｒｐｒｏｂｌｅｍｗｅｎｅｅｄｔｏｒｅｓｏｌｖｅ．Ｔｏａｄｄｒｅｓｓｔｈｅｓｅｉｓｓｕｅｓ，ｉｎｔｈｉｓｗｏｒｋｗｅｉｍｐｌｅｍｅｎｔａｎｅｗｐａｒａｌｌｅｌｉｚａｔｉｏｎｏｆｓｅａｉｃｅｍｏｄｅｌａｌｇｏｒｉｔｈｍｏｎＳｕｎｗａｙｍａｎｙｃｏｒｅｐｒｏｃｅｓｓｏｒ．ＳｕｎｗａｙｍａｎｙｃｏｒｅｐｒｏｃｅｓｓｏｒａｒｃｈｉｔｅｃｔｕｒｅｉｓｍｕｃｈｄｉｆｆｅｒｅｎｔｆｒｏｍｔｈｅａｒｃｈｉｔｅｃｔｕｒｅｏｆＣＰＵｗｈｉｃｈｗｅｕｓｅｎｏｗ．ＥａｃｈＳｕｎｗａｙｍａｎｙｃｏｒｅｐｒｏｃｅｓｓｃｏｎｔａｉｎｓ４ＣＧｓ（ＣｏｒｅＧｒｏｕｐｓ），ｅａｃｈＣＧｃｏｎｔａｉｎｓ１Ｍｍｅｎｔ）ａｎｄ６４ＣＰＥｓ（ＣｏｍｐｕｔｉｎｇＰｒｏｃｅｓｓＥｌｅｍｅｎｔ）．ＴｏｅｘｐｌｏｉｔｔｈｅｍａｓｓｉｖｅｐａｒａｌｌｅｌｉｓｍｏｆｆｅｒｅｄｂｙＳｕｎｗａｙｍａｎｙｃｏｒｅｐｒｏｃｅｓｓｏｒ，ｗｅｐｒｏｐｏｓｅａｐａｒａｌｌｅｌａｌｇｏｒｉｔｈｍｆｏｒｓｅａｉｃｅｍｏｄｅｌｗｈｉｃｈｃｏｎｔａｉｎｓｔｈｅｗａｙｓｆｏｒｄａｔａｔｒａｎｓｍｉｓｓｉｏｎ，ｄａｔａｄｉｖｉｄｉｎｇａｓｗｅｌｌａｓｔｈｅｃａｌｃｕｌａｔｉｏｎｍｅｔｈｏｄ．Ｔｈｅｍａｉｎｐｕｒｐｏｓｅｏｆｏｕｒｗｏｒｋｉｓｒｅｄｕｃｉｎｇｔｈｅｄａｔａｔｒａｎｓｆｅｒｔｉｍｅａｎｄｅｘｐｌｏｉｔｉｎｇｔｈｅｐａｒａｌｌｅｌｉｓｍｏｆｆｅｒｅｄｂｙｔｈｅｐｌａｔｆｏｒｍ．Ｔｈｅｍｅｔｈｏｄｓｗｅｕｓｅｄｔｏａｃｈｉｅｖｅｔｈｉｓｇｏａｌｉｓｅａｓｙｕｎｄｅｒｓｔａｎｄｉｎｇ

基于光滑聚集代数多重网格的有限元并行计算实现方法

基于光滑聚集代数多重网格的有限元并行计算实现方法武立伟;张健飞;张倩【摘要】基于光滑聚集代数多重网格法实现一种用于结构有限元并行计算的预条件共轭梯度求解方法.对计算区域进行均匀划分,将这些子区域分配给各个进程同时进行单元刚度矩阵的计算,并组合形成分布式存储的整体平衡方程.采用光滑聚集代数多重网格预条件共轭梯度法对整体平衡方程进行并行求解,在天河二号超级计算机上进行数值试验,分析代数多重网格的主要参数对算法性能的影响,测试程序的并行计算性能.试验结果表明该方法具有较好的并行性能和可扩展性,适合于大规模实际应用.【期刊名称】《计算机辅助工程》【年(卷),期】2017(026)006【总页数】7页(P16-22)【关键词】有限元法;光滑聚集;代数多重网格;共轭梯度法;可扩展性【作者】武立伟;张健飞;张倩【作者单位】河海大学力学与材料学院,南京211100;河海大学力学与材料学院,南京211100;河海大学力学与材料学院,南京211100【正文语种】中文【中图分类】TB121有限元法是工程结构分析的一种重要数值方法。

随着工程规模不断扩大、工程复杂性不断增加和计算精度要求不断提高，传统的串行有限元程序的计算规模和计算速度都已经不能满足需求，迫切需要发展能够在超级计算机上高效运行的可扩展有限元并行算法和程序。

目前，传统有限元并行计算中常用的并行算法主要有子结构并行算法[1]、多波前法[2]和预条件共轭梯度法[3-4]等。

这些方法中的直接法计算量和存储量大、并行程度不高，预条件共轭梯度法中的预条件为提高并行性而降低收敛性，也无法适应大规模应用。

代数多重网格法[5-6]不需要几何网格信息，仅从方程组代数结构出发，形成虚拟的粗细网格，以达到加速收敛的目的。

该方法具有存贮量小、收敛快和可扩展性好等优点，将其用于结构有限元大规模并行计算，可以进一步增大计算规模、提高计算速度，能满足现代工程结构分析与设计的要求。

本文基于光滑聚集型代数多重网格法[7]，实现一种用于结构有限元大规模并行计算的预条件共轭梯度方法，并在天河二号超级计算机上对不同的聚集策略、光滑迭代算法和粗网格求解方法进行影响分析，测试和分析程序的并行性能。

Matlab并行计算工具箱及MDCE介绍

Matlab并行计算工具箱及MDCE介绍.doc3.1 Matlab并行计算发展简介MATLAB技术语言和开发环境应用于各个不同的领域，如图像和信号处理、控制系统、财务建模和计算生物学。

MATLAB通过专业领域特定的插件(add-ons)提供专业例程即工具箱(Toolbox)，并为高性能库(Libraries)如BLAS(Basic Linear Algebra Subprograms，用于执行基本向量和矩阵操作的标准构造块的标准程序)、FFTW(Fast Fourier Transform in the West，快速傅里叶变换)和LAPACK(Linear Algebra PACKage，线性代数程序包)提供简洁的用户界面，这些特点吸引了各领域专家，与使用低层语言如C语言相比可以使他们很快从各个不同方案反复设计到达功能设计。

计算机处理能力的进步使得利用多个处理器变得容易，无论是多核处理器，商业机群或两者的结合，这就为像MATLAB一样的桌面应用软件寻找理论机制开发这样的构架创造了需求。

已经有一些试图生产基于MATLAB的并行编程的产品，其中最有名是麻省理工大学林肯实验室(MIT Lincoln Laboratory)的pMATLAB和MatlabMPI，康耐尔大学(Cornell University)的MutiMATLAB和俄亥俄超级计算中心(Ohio Supercomputing Center)的bcMPI。

MALAB初期版本就试图开发并行计算，80年代晚期MATLAB的原作者，MathWorks公司的共同创立者Cleve Moler曾亲自为英特尔HyperCube和Ardent 电脑公司的Titan超级计算机开发过MATLAB。

Moler 1995年的一篇文章“Why there isn't a parallel MATLAB?[**]” 中描述了在开了并行MATLAB语言中有三个主要的障碍即:内存模式、计算粒度和市场形势。

预处理子空间迭代法的一些基本概念

CG算法的预处理技术：、为什么要对A进行预处理：其收敛速度依赖于对称正定阵A的特征值分布特征值如何影响收敛性：特征值分布在较小的范围内，从而加速CG的收敛性特征值和特征向量的定义是什么？（见笔记本以及收藏的网页）求解特征值和特征向量的方法：Davidson方法：Davidson 方法是用矩阵( D - θI)- 1( A - θI) 产生子空间，这里D 是A 的对角元所组成的对角矩阵。

θ是由Rayleigh-Ritz 过程所得到的A的近似特征值。

什么是子空间法：Krylov子空间叠代法是用来求解形如Ax=b 的方程，A是一个n*n 的矩阵，当n充分大时，直接计算变得非常困难，而Krylov方法则巧妙地将其变为Kxi+1=Kxi+b-Axi 的迭代形式来求解。

这里的K(来源于作者俄国人Nikolai Krylov姓氏的首字母)是一个构造出来的接近于A的矩阵，而迭代形式的算法的妙处在于，它将复杂问题化简为阶段性的易于计算的子步骤。

如何取正定矩阵Mk为：Span是什么？:设x_(1,)...,x_m∈V ,称它们的线性组合∑_(i=1)^m?〖k_i x_i \|k_i∈K,i=1,2...m〗为向量x_(1,)...,x_m的生成子空间，也称为由x_(1,)...,x_m张成的子空间。

记为L（x_(1,)...,x_m），也可以记为Span（x_(1,)...,x_m）什么是Jacobi迭代法：什么是G_S迭代法：请见PPT《迭代法求解线性方程组》什么是SOR迭代法：什么是收敛速度：什么是可约矩阵与不可约矩阵？：不可约矩阵（irreducible matrix）和可约矩阵（reducible matrix）两个相对的概念。

定义1：对于n 阶方阵A 而言，如果存在一个排列阵P 使得P'AP 为一个分块上三角阵，我们就称矩阵A 是可约的；否则称矩阵A 是不可约的。

定义2：对于n 阶方阵A=(aij) 而言，如果指标集{1，2，...，n} 能够被划分成两个不相交的非空指标集J 和K，使得对任意的j∈J 和任意的k∈K 都有ajk=0, 则称矩阵 A 是可约的；否则称矩阵A 是不可约的。

NVIDIA Maximus Technology for ANSYS Mechanical 详解说

DU-06Use6467-001_v01 |er GuideSeptember 20112DOCUMENT CHANGE HISTORYDU-06467-001_v01Version Date Authors Description of Change01 August 3, 2012 Initial release02 September 10, 2012 EK, SP Minor edits, clarity regarding solver optimizationsTABLE OF CONTENTS Maximus Technology for ANSYS Mechanical (4)Prerequisite Skills (4)What This Document Contains (5)Benefits of Maximus Technology (5)OEMs for Maximus Technology (5)NVIDIA Maximus Technology (6)Maximus Computing Advantage (6)Elements of Maximus Technology (6)Basic Maximus Configuration (8)Maximus for ANSYS Mechanical (9)Enabling ANSYS Mechanical for Maximus (9)Monitoring GPU Activity with Maximus Configuration Utility (11)Supported Solver Types (12)Model Considerations (12)Troubleshooting (13)References (13)This document describes the basic settings, configuration, and monitoring of an NVIDIA® Maximus‐enabled workstation for ANSYS Mechanical solvers.This document does not replace any documentation provided by ANSYS for softwareofferings specific to CAE software. Refer to the documentation provided by ANSYS forANSYS software configurations.This document does not explain the fundamentals of ANSYS usage or the discipline ofCAE.PREREQUISITE SKILLSThis document is intended for persons responsible for optimizing a Maximus‐enabled workstation for ANSYS Mechanical. It is assumed the audience is familiar with, or has skilled experience with the following:④ANSYS Mechanical CAE Software④Computer Aided Engineering (CAE)④Graphics Processing Unit (GPU) functionality④Modern workstation terminology④Hardware connectivity④Physical system building skills④Thermal and electrical workstation system internals④Microsoft Windows configurationMaximus Technology for ANSYS MechanicalWHAT THIS DOCUMENT CONTAINSThis document provides an introduction to NVIDIA Maximus technology and how to enable your workstation and ANSYS Mechanical to use Maximus.④NVIDIA Maximus Technology starting on page 6 describes the benefits of NVIDIAMaximus technology, its key elements, and the basic system requirements forenabling Maximus on your workstation.④Maximus for ANSYS Mechanical starting on page 9 focuses on using Maximus withANSYS Mechanical. It explains how to enable Maximus for ANSYS Mechanical and monitor GPU activity. It identifies the solvers that are enabled for Maximus, describes model considerations, and troubleshooting tips when a solver workload does not perform to your expectations.④References on page 13 provides useful references to related documentation.BENEFITS OF MAXIMUS TECHNOLOGYFor a comprehensive overview of Maximus technology, its benefits, and how it is being used, go to /maximus.OEMS FOR MAXIMUS TECHNOLOGYA list of OEMs that carry Maximus platforms are listed at/maximus.This section describes the benefits of Maximus‐enabled workstations and applications, the key elements of Maximus technology, and the basic configuration for a Maximus‐enabled workstation.MAXIMUS COMPUTING ADVANTAGEIn the past, workstation architectures forced professionals to do graphics‐intensive and compute‐intensive work serially; often offline. NVIDIA Maximus technology represents a revolution for these professionals by enabling both tasks to be performed concurrently without experiencing any drop in performance.For example; a designer can work on design iteration B while running a simulation on design iteration A. Because these tasks are performed concurrently, it is possible to explore ideas faster and converge more quickly on the best possible answers. ELEMENTS OF MAXIMUS TECHNOLOGYMaximus is an enabling technology that brings together the professional 3D graphics capability of NVIDIA Quadro® GPUs with the massive parallel computing capabilities of the NVIDIA Tesla™ C2075 companion processors. Figure 1 illustrates the advantages of the Tesla processor.F F M QFFigure 1. igure 2 show Maximus ‐ena Quadro cardsFigure 2. Tesla Pe ws the perfor abled works s in the perfoSolidWor rformance rmance imp tation. The M ormance sca rks Scalingeprovements p Maximus co aling chart pg Chart possible with nfiguration lus oneTesl h ANSYS M can consist la card.Mechanical on of any ofthe n a eBASIC MAXIMUS CONFIGURATIONThis section describes hardware and software requirements for a Maximus‐enabled workstation running ANSYS Mechanical. Only the basic requirements are covered in this document. For further details about upgrading an eligible workstation to a Maximus configuration, refer to the NVI DIA Maximus System Builders’ Guide for Microsoft Windows 7‐64 document.Check that your system satisfies the following software and hardware requirements:④Microsoft Windows 7 – 64 bit operating system.④ANSYS Mechanical 14 with HPC License Pack. At the release date of this document,one HPC pack enables eight CPU cores and one entire GPU.④One NVIDIA Quadro card installed in the first x16 (x16 electrical) PCIe slot of thehost computer.④One NVIDIA Tesla card installed in the second x16 (x16 electrical) PCIe slot of thehost computerAfter ANSYS Mechanical becomes multi-GPU aware and if it is physically possible, youcan have more than one Tesla processor in your system. At the release date of thisdocument, ANSYS Mechanical is still single-GPU aware. Follow future announcementsfrom ANSYS and NVIDIA regarding multi-GPU awareness.At the release date of this document, Tesla C2075 is the only supported compute cardfor ANSYS Mechanical.NVIDIA recommends that no display device be connected to the Tesla C2075 DVI displayoutput.④NVIDIA Quadro/Tesla Driver 275.89, or newer ANSYS‐certified driver, correctlyinstalled. Refer to for a list of drivers for download④Correct installation and cabling with power connectors (as needed) of all NVIDIAgraphics cards. If you purchased your system from an OEM with the NVIDIA cards pre‐installed, no action is needed.No NVIDIA SLI (Scalable Link Interface) ribbon cable is necessary or required for aMaximus configuration.T ac n so E U 1.2.This section e ctivity. The s eed to be co olver worklo ENABLIN Use the follow . Select Tool . Select Solve explains how solvers that nsidered are oad that doe NG ANS wing proced s from the m e Process Set w to enable M can use Max e identified. es not perfor SYS MEC dure to enabl main menuttings… to dis Maximus for ximus are lis The section rm to your ex CHANIC le Maximus splay the Solv r ANSYS Me sted and cha also contain xpectations.AL FOR for ANSYS ve Process Se echanical an aracteristics ns trouble ‐shR MAXIM Mechanical ettings menund monitor G of models th hooting tips MUS :u .GPU hat for a3.4.5.6.7.. Check that . Click Adva . Select NVID . Click OK .. Open the A Search for displayed,* softwar * licens* ********BATCH M INPUT F 6 PA START-U STOP FI GPU ACC00000000 CURRENT Working to automatica provides th t My Compute nced… to dis DIA from the ANSYS Mec the text GPU , Maximus is re license ses). ********** ***** A MODE REQUE FILE COPY ARALLEL CP UP FILE MO ILE MODE CELERATOR 0 JOBNAME=f ogether with ally ensures he best perfo er is selected splay the Adv drop ‐down hanical solv U ACCELERA s not enabled agreement **********ANSYS COMMA STED (-b) MODE (-c) PUS REQUEST DE OPTION ENA VERSION=W file 14:22h ANSYS Me that ANSYS ormance.d on the Solv vanced Prope list of the Us ve.out file ATOR OPTIO d.t and FAR **********AND LINE A = NOLI = COPY TED= NORE = NORE ABLEDWINDOWS x62:46 OCT echanical so S Mechanica ve Process Set erties dialog se GPU accel to check tha ON ENABLED 12.212 (fo **********ARGUMENTS STYADAD4 RELE 24, 2011 C oftware, the M al runs on th Maximus fo ttings menu.menu.eration (if po at Maximus D . If this text or non-DOD ***************EASE= 13.0CP= 0Maximus dr he Tesla GPU or ANSYS Mech ossible) field.is enabled. string is not D ** * ******** UP201.811riverU. This settin hanical.t 101012 ngM UTumTcoFInerreMonitoriUtilityNVIDIA Mworkstatis accessQuadro dThe NVIDIAtility that prmemory andTypically, MCorrectly. FigFigure 3.n typical ANrror correctieduced by 13ng GPUMaximus Configtions only. Dowsible from in tdrivers.Maximus Crovides convutilization mCU is used ture 3 showsNVIDIA MNSYS Mechanon) off. Note3 percent.Activityguration Utilitwnload the MChe NVIDIA ConConfigurationvenient GPUmonitors foro ensure thas the MCU mMaximus Conical workfle that wheny with Mty (MCU) is suCU from http:ntrol Panel stan Utility (MCU processingr all supporteat ANSYS Mmenu page.onfiguratiolows, you haECC is on, aMaximuspported for M//www.nvidiaarting with thCU) is a sepcontrols. Thed GPUs inMechanical ison Utilityave the optioavailable meConfiguMaximus-enabl/maxime 304 releasearate graphihe MCU proa Maximus‐s using the syon to turn ECemory on thurationledmus. The MCUof theical softwareovides GPUenabled sysystem GPUsCC (memorye Tesla boarestem.syrd isThe MCU provides simple controls to enable or disable computational processing on an installed Quadro GPU. Use this feature to better tune the system for a particular workflow need.Supported Solver TypesANSYS Mechanical uses a companion Tesla GPU with a single job per GPU. The following ANSYS Mechanical solvers are Maximus‐enabled:④Direct sparse solver for SMP and distributed ANSYS④PCG and JCG iterative solvers for SMP and distributed ANSYSModel ConsiderationsSimulation models exist in a wide variety of sizes normally measured in degrees of freedom (DOF). Though all models may exhibit some level of GPU acceleration benefit, consideration should be given to the following factors to achieve maximum acceleration in ANSYS Mechanical:④Models that deploy the direct sparse solver:●Models with approximately 500 thousand to 8 million DOF typically yield the mostaccelerated performance.●All model sizes are supported, but for very large models beyond 8 million DOF,some work might exceed the 6GB GPU memory and therefore stay on the CPU.●Models should always run in‐core (system memory) to eliminate I/O.④Models that deploy the iterative PCG or JCG solver:●Models with approximately 500 thousand to 5 million DOF typically yield the mostaccelerated performance.●Model size must not exceed the 6GB memory or they will not run, and will need tobe restarted on CPUs‐only (ANSYS Mechanical will provide a message in this case).●The is no out‐of‐core option (models always runs in‐core for iterative solvers)●The MSAV option should be turned off, otherwise the GPU is deactivated. ANSYSWorkbench will automatically set MSAVE for models over 100,000 nodes.Solid structures always provide better performance than shell structures.TROUBLESHOOTINGThere may be times when a solver workload does not perform to your expectations. Following is a list of common items that typically hinder optimal performance of asolver on a Maximus‐enabled workstation:④The Tesla C2075 is not set to handle compute tasks.④ECC is turned ON for the Tesla C2075, or the Quadro 6000, or both. While enablingECC improves accuracy, it slows down performance.④The disk subsystem I/O rate is too slow or bandwidth is constrained.④There is not enough scratch disk space in the system.④The job unexpectedly runs out of core memory.④There is not enough memory in the system.④The ANSYS license does not provide support for Maximus.④The simulation job is too small. See “Model Considerations” on page 10.④MSAV option is set to ON.④Shell structure models are being used.④More CPUs are being used than necessary for the simulation job. More CPUs do notnecessarily add linearly to overall wall clock time performance.REFERENCES④ANSYS Support Documentation: Refer to /Support/Documentation ④NVIDIA Maximus Configuration Guide: Refer to NVIDIA Maximus System Builders’Guide for Microsoft Windows 7‐64NALDOWEXAInreriganwCoexHHDHDRNVcainThandeonprOOpTNVCoreC©oticeLL NVIDIA DESIGNOCUMENTS (TOGWARRANTIES, EXPXPRESSLY DISCLAPARTICULAR PUformation furnisesponsibility forghts of third parny patent rightsithout notice. Torporation produxpress written apDMIDMI, the HDMI lDMI Licensing LLOVI ComplianVIDIA Products tan only be sold ocorporate the dehis device is prond other intelleevice must be aunly, unless otherohibited.OpenCLpenCL is a traderademarksVIDIA, the NVIDorporation in theespective compaopyright2012 NVIDIA CorN SPECIFICATIONGETHER AND SEPPRESSED, IMPLIEAIMS ALL IMPLIEDRPOSE.shed is believedthe consequencrties that may rof NVIDIA CorpoThis publicationucts are not autpproval of NVIDIAogo, and High-DLC.nce Statemehat support Rovior distributed toevice into buyerotected by U.S. pctual propertyuthorized by ROVrwise authorizemark of Apple InDIA logo, Teslae U.S. and othenies with whichrporation. All rigNS, REFERENCE BPARATELY, “MATED, STATUTORY,D WARRANTIES Od to be accuratces of use of sucresult from its uoration. Specificsupersedes andthorized as critiA Corporation.Definition Multimnti Corporation’s Ro buyers with a’s products.patent numbersrights. The useVI Corporation and in writing bync. used under li, and Quadroer countries. Oththey are associaghts reserved.BOARDS, FILES, DTERIALS”) ARE B, OR OTHERWISOF NONINFRINGEte and reliable.ch information ose. No license iscations mentionereplaces all otical componentsmedia InterfaceRevision 7.1.L1 Avalid and existin6,516,132; 5,58e of ROVI Corpond is intended foROVI Corporatiicense to the Khare trademarksher company andated.DRAWINGS, DIAGBEING PROVIDEDE WITH RESPECEMENT, MERCHAN. However, NVIDor for any infrins granted by imed in this publicher informations in life supportare trademarksAnti-Copy Procesng authorization83,936; 6,836,54oration's copy por home and othion. Reverse enronos Group Inc.s and/or registd product nameGNOSTICS, LISTSD “AS IS.” NVIDICT TO THE MATNTABILITY, ANDDIA Corporationngement of pateplication of othcation are subjepreviously suppt devices or systor registered trss (ACP) encodinn from ROVI to p49; 7,050,698; aprotection technher limited pay-pngineering or di.ered trademarkes may be trade, AND OTHERIA MAKES NOTERIALS, ANDFITNESS FORassumes noents or othererwise underect to changeplied. NVIDIAtems withoutrademarks ofng technologypurchase andnd 7,492,896nology in theper-view usesisassembly isks of NVIDIAmarks of the。

基于Kriging模型的自适应多阶段并行代理优化算法

第27卷第11期2021年11月计算机集成制造系统Vol.27No.11 Computer Integrated Manufacturing Systems Nov.2021DOI：10.13196/j.cims.2021.11.016基于Kriging模型的自适应多阶段并行代理优化算法乐春宇，马义中+(南京理工大学经济管理学院，江苏南京210094)摘要：为了充分利用计算资源，减少迭代次数,提出一种可以批量加点的代理优化算法。

该算法分别采用期望改进准则和WB2(Watson and Barnes)准则探索存在的最优解并开发已存在最优解的区域,利用可行性概率和多目标优化框架刻画约束边界。

在探索和开发阶段，设计了两种对应的多点填充算法，并根据新样本点和已知样本点的距离关系，设计了两个阶段的自适应切换策略。

通过3个不同类型算例和一个工程实例验证算法性能，结果表明，该算法收敛更快，其结果具有较好的精确性和稳健性。

关键词:Kriging模型;代理优化;加点准则；可行性概率;多点填充中图分类号：O212.6文献标识码：AParallel surrogate-based optimization algorithm based on Kriging model usingadaptive multi-phases strategyYUE Chunyu,MA Yizhong+(School o£Economics and Management,Nanjing University of Science and Technology,Nanjing210094,China) Abstract：To make full use of computing resources and reduce the number of iterations,a surrogate-based optimization algorithm which could add batch points was proposed.To explore the optimum solution and to exploit its area, the expected improvement and the WB2criterion were used correspondingly.The constraint boundary was characterized by using the probability of feasibility and the multi-objective optimization framework.Two corresponding multi-points infilling algorithms were designed in the exploration and exploitation phases and an adaptive switching strategy for this two phases was designed according to the distance between new sample points and known sample points.The performance of the algorithm was verified by three different types of numerical and one engineering benchmarks.The results showed that the proposed algorithm was more efficient in convergence and the solution was more precise and robust.Keywords:Kriging model；surrogate-based optimization；infill sampling criteria；probabil让y of feasibility；multipoints infill0引言现代工程优化设计中，常采用高精度仿真模型获取数据，如有限元分析和流体动力学等E,如何在优化过程中尽可能少地调用高精度仿真模型，以提高优化效率，显得尤为重要。

高超声速流动CFD并行计算研究

万方数据万方数据否则网格将发牛错误。

图２网格拓展及数据交换方式３．４边界条件（１）远场边界条件。

根据当地边界法向速度的正负判断来流是人流还是出流，对于入流边界，所有流动参数均由自由来流确定；对于出流边界，所有流动参数均由内场外插得到。

（２）壁面边界条件。

对于粘性流动，不可穿透的壁面边界应满足无滑移条件：‰＝０，‰一０，‰＝０壁面压力可通过法向动量方程的简化形式计算为：ａ，ｔ，一０ａ竹壁面气体温度由等温壁或绝热壁条件决定，即：Ｌ一∞钉毗（等温壁）或Ｉ嚣Ｌ＝ｏ（绝热壁）（３）对称边界条件。

对称边界条件要求对称面两侧的速度切向分量相等，法向分量相反，其他值相等即可。

（４）对接边界条件。

对于对接边界，可根据网格拓扑关系，寻找对应点的流场点，将相邻区的网格内点上的单元物理量通过消息传递赋给当前区的边界外点（即网格延拓得到的虚拟网格）。

这样，通过边界点的传递，在整个迭代过程中保证了分区之间的消息传递。

４算例计算结果与分析这里采用文献Ｅ９３中给出的圆柱体表面压力系数的结果进行算法的验证。

圆柱体半径为０．０３８ｍ，其高超声速来流条件为：地＝１６．３４，Ｌ一５２Ｋ，如一８２．９５Ｐａ，Ｔ，ｏ＝２９４．４Ｋ。

初始计算网格为三维单分区网格，网格节点总数目为３０１×１０１×１１，用到的边界条件有远场条件、壁面条件和对称面条件，如图３所示。

图３初始网格及边界条件设定针对单区域的初始网格，沿ｉ方向分别进行２分区、４分区的蕈新划分，并分别在数量不同的Ｐｃ机上进行流场数值计算，得到的计算结果如图４～图７所示。

首先考察计算所得流场是否准确。

由图４～图６中的１０８图４１分区网格及压力计算等值线图图５２分区网格及压力计算等值线图图６４分区网格及压力计算等值线图图７不同分区的Ｃｐ计算结果对比压力等值线图可以看出，不同分区得到的流场形式几乎完全一致，在分区交界面上的等值线无间断，保证了物理龟的连续；而通过图７中压力系数的对比，可见不同分区计算值能够很好地吻合，充分验证了边界数据处理方法的合理性和正确性。

ParallelComputingToolbox：并行计算工具箱

Parallel computing with MATLAB. You can use Parallel Computing Toolbox to run applications on a multicore desktop with local workers available in the toolbox, take advantage of GPUs, and scale up to a cluster (with MATLAB Distributed Computing Server).Programming Parallel ApplicationsParallel Computing Toolbox provides several high-level programming constructs that let you convert your applications to take advantage of computers equipped with multicore processors and GPUs. Constructs such as parallel for-loops(parfor)and special array types for distributed processing and for GPU computing simplify parallel code development by abstracting away the complexity of managing computations and data between your MATLAB session and the computing resource you are using.You can run the same application on a variety of computing resources without reprogramming it. The parallel constructs function in the same way, regardless of the resource on which your application runs—a multicore desktop (using the toolbox) or on a larger resource such as a computer cluster (using toolbox with MATLAB Distributed Computing Server).Using Built-In Parallel Algorithms in Other MathWorks ProductsKey functions in several MathWorks products have built-in parallel algorithms. In the presence of Parallel Computing Toolbox, these functions can distribute computations across available parallel computing resources, allowing you to speed up not just your MATLAB and Simulink based analysis or simulation tasks but also code generation for large Simulink models. You do not have to write any parallel code to take advantage of thesefunctions.Using built-in parallel algorithms in MathWorks products. Built-in parallel algorithms can speed up MATLAB and Simulink computations as well as code generation from Simulink models.Speeding Up Task-Parallel ApplicationsYou can speed up some applications by organizing them into independent tasks(units of work) and executing multiple tasks concurrently. This class of task-parallel applications includes simulations for design optimization, BER testing, Monte Carlo simulations, and repetitive analysis on a large number of data files.The toolbox offers parfor, a parallel for-loop construct that can automatically distribute independent tasks to multiple MATLAB workers(MATLAB computational engines running independently of your desktop MATLAB session). This construct automatically detects the presence of workers and reverts to serial behavior if none are present. You can also set up task execution using other methods, such as manipulating task objects in thetoolbox.Using parallel for-loops for a task-parallel application. You can use parallel for-loops in MATLAB scripts and functions and execute them both interactively and offline.Speeding Up MATLAB Computations with GPUsParallel Computing Toolbox provides GPUArray, a special array type with several associated functions that lets you perform computations on CUDA-enabled NVIDIA GPUs directly from MATLAB. Functions include fft, element-wise operations, and several linear algebra operations such as lu and mldivide, also known as the backslash operator (\). The toolbox also provides a mechanism that lets you use your existing CUDA-based GPU kernels directly from MATLAB.Learn more about GPU computing with MATLAB.GPU computing with MATLAB. Using GPUArrays and GPU-enabled MATLAB functions help speed up MATLAB operations without low-level CUDA programming.Scaling Up to Clusters, Grids, and Clouds Using MATLAB Distributed Computing ServerParallel Computing Toolbox provides the ability to run MATLAB workers locally on your multicore desktop to execute your parallel applications allowing you to fully use the computational power of your desktop. Using the toolbox in conjunction with MATLAB Distributed Computing Server, you can run your applications on large scale computing resources such as computer clusters or grid and cloud computing resourcesRunning a gene regulation model on a cluster using MATLAB Distributed Computing Server. The server enables applications developed using Parallel Computing Toolbox to harness computer clusters for large problems.Listening to the World’s Oceans: Searching for Marine Mammals by Detecting andClassifying Terabytes of Bioacoustic Data in Clouds of Noise51:32This session describes how Cornell University Bioacoustics Research Program datascientists use MATLAB®to develop high-performance computing software to processand analyze terabytes of acoustic data.Implementing Data-Parallel Applications using the Toolbox and MATLAB Distributed Computing ServerDistributed arrays in Parallel Computing Toolbox are special arrays that hold several times the amount of data that your desktop computer’s memory (RAM) can hold. Distributed arrays apportion the data across several MATLAB worker processes running on a computer cluster (using MATLAB Distributed Computing Server). As a result, with distributed arrays you can overcome the memory limits of your desktop computer and solve problems that require manipulating very large matrices.With over 150 functions available for working with distributed arrays, you can interact with these arrays as you would with MATLAB arrays and manipulate data available remotely on workers without low-level MPI programming. Available functions include linear algebra routines based on ScaLAPACK, such as mldivide, also known as the backslash operator (\),lu and chol, and functions for moving distributed data to and fromMAT-files.For fine-grained control over your parallelization scheme, the toolbox provides the single program multiple data (spmd)construct and several message-passing routines based on an MPI standard library (MPICH2). The spmd construct lets you designate sections of your code to run concurrently across workers participating in a parallel computation. During program execution,spmd automatically transfers data and code used within its body to the workers and, once the execution is complete, brings results back to the MATLAB client session. Message-passing functions for send, receive, broadcast, barrier, and probe operations are available.Programming with distributed arrays. Distributed arrays and parallel algorithms let you create data-parallel MATLAB programs with minimal changes to your code and without programming in MPI.Product Details, Examples, and System Requirements/products/parallel-computingTrial Software/trialrequestSales/contactsalesTechnical Support/support Running Parallel Applications Interactively and as Batch JobsYou can execute parallel applications interactively and in batch using Parallel Computing Toolbox. Using the parpool command, you can connect your MATLAB session to a pool of MATLAB workers that can run either locally on your desktop (using the toolbox) or on a computer cluster (using MATLAB Distributed Computing Server ) to setup a dedicated interactive parallel execution environment. You can execute parallel applications from the MATLAB prompt on these workers and retrieve results immediately as computations finish, just as you would in any MATLAB session.Running applications interactively is suitable when execution time is relatively short. When your applications need to run for a long time, you can use the toolbox to set them up to run as batch jobs. This enables you to free your MATLAB session for other activities while you execute large MATLAB and Simulink applications.While your application executes in batch, you can shut down your MATLAB session and retrieve results later. The toolbox provides several mechanisms to manage offline execution of parallel programs, such as the batch function and job and task objects. Both the batch function and the job and task objects can be used tooffload the execution of serial MATLAB and Simulink applications from a desktop MATLAB session.Running parallel applications interactively and as batch jobs. You can run applications on your workstation using twelve workers available with the toolbox, or on a computer cluster using more workers available with MATLAB Distributed Computing Server.ResourcesOnline User Community /matlabcentral Training Services /training Third-Party Products and Services /connections Worldwide Contacts /contact。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

SUN4 = SUN SPARC 10
TC2K = 4 processors
FX80 = 2 processors
BBN TC2000
Alliant FX/80

Configuration:

RS6K = IBM RS6000

FX/80
SUN4TC2KTC2KTC2KMTC2KRS6KSUN4

Monitor processor
Cluster of processors
single processor