Chapter 3 Parallel Algorithm design

合集下载

算法设计的方法

算法设计的方法算法设计是计算机科学和软件工程领域的一项重要任务，它涉及为解决特定问题而创建高效、正确和可行的计算步骤。

算法设计方法是一套策略、技巧和方法，帮助程序员和研究人员开发有效的算法。

以下是一些常用的算法设计方法：1. 暴力法（Brute Force）：尝试所有可能的解决方案，直到找到最优解。

这种方法通常适用于问题规模较小的情况。

2. 贪心法（Greedy Algorithm）：每一步都选择局部最优解，期望最终获得全局最优解。

贪心法容易实现，但并不总是能够得到最优解。

3. 分治法（Divide and Conquer）：将问题分解为若干个较小的子问题，然后递归地解决子问题，并将子问题的解合并为原问题的解。

分治法适用于具有自相似结构的问题。

4. 动态规划（Dynamic Programming）：将问题分解为重叠子问题，并通过自底向上或自顶向下的方式逐一解决子问题，将已解决子问题的解存储起来，避免重复计算。

动态规划适用于具有最优子结构和重叠子问题的问题。

5. 回溯法（Backtracking）：通过递归搜索问题的解空间树，根据约束条件剪枝，回溯到上一层尝试其他解。

回溯法适用于约束满足性问题，如八皇后问题、图的着色问题等。

6. 分支界限法（Branch and Bound）：在搜索解空间树时，通过计算上界和下界来剪枝。

分支界限法适用于求解整数规划和组合优化问题。

7. 随机化算法（Randomized Algorithm）：通过随机选择解空间中的元素来寻找解决方案。

随机化算法的优点是简单、易于实现，但可能需要多次运行才能获得最优解。

8. 近似算法（Approximation Algorithm）：在问题的最优解难以找到或计算代价过高时，提供一个接近最优解的解。

近似算法可以提供一个性能保证，即解的质量与最优解之间的差距不会超过某个阈值。

9. 并行和分布式算法（Parallel and Distributed Algorithm）：将问题的计算分布到多个处理器或计算机上，以提高计算速度和效率。

基于GPU多核CPU平台下并行计算的实时超分辨和立体视图生成

代号分类号学号密级10701TP37公开1102121253题(中、英文)目基于GPU/多核CPU平台下并行计算的实时超分辨和立体视图生成Real-time Super-resolution and Stereoscopic View Genera-tion with GPU/Multicore CPU Based Parallel Computing 作者姓名孙增增指导教师姓名、职务郑喆坤教授学科门类工学提交论文日期二〇一四年三月学科、专业模式识别与智能系统西安电子科技大学学位论文独创性（或创新性）声明秉承学校严谨的学风和优良的科学道德，本人声明所呈交的论文是我个人在导师指导下进行的研究工作及取得的研究成果。

尽我所知，除了文中特别加以标注和致谢中所罗列的内容以外，论文中不包含其他人已经发表或撰写过的研究成果；也不包含为获得西安电子科技大学或其它教育机构的学位或证书而使用过的材料。

与我一同工作的同志对本研究所做的任何贡献均已在论文中做了明确的说明并表示了谢意。

申请学位论文与资料若有不实之处，本人承担一切的法律责任。

本人签名：日期：西安电子科技大学关于论文使用授权的说明本人完全了解西安电子科技大学有关保留和使用学位论文的规定，即：研究生在校攻读学位期间论文工作的知识产权单位属西安电子科技大学。

学校有权保留送交论文的复印件，允许查阅和借阅论文；学校可以公布论文的全部或部分内容，可以允许采用影印、缩印或其它复制手段保存论文。

同时本人保证，毕业后结合学位论文研究课题再撰写的文章一律署名单位为西安电子科技大学。

（保密的论文在解密后遵守此规定）本人授权西安电子科技大学图书馆保存学位论文，本学位论文属于(保密级别)，在年解密后适用本授权书，并同意将论文在互联网上发布。

本人签名：日期：导师签名：日期：摘要近些年来，许多因素导致了计算产业转向了并行化发展的方向。

在这过程中，受市场对实时、高清晰3维图形绘制的需求驱使，可编程的图形处理单元（GPU）逐渐发展进化成为了具有强大计算能力、非常高内存带宽的高度并行、多线程的众核处理器。

基于工作流模型驱动的并行算法设计教学方法

基于工作流模型驱动的并行算法设计教学方法摘要:《并行算法设计》属于高等计算机程序设计的主要课程之一,其主要难点集中在如何将特定的并行求解模型转化为具体的程序设计语言。

传统的教学方法主要通过讲授并行程序设计语言来实现教学目标,已有的教学实践经验显示该方法存在的诸多不足之处。

对此,本文提出了一种基于模型驱动的教学方法,其核心思想是:以并行问题求解模型为教学主线,通过分析与讲授并行问题求解模型的基本特征以及不同模型之间的异同来向学生传授并行算法的关键思想和技巧。

该方法的主要优点是:实现了算法设计思想与具体程序语言的独立性,能有效地引导学生掌握并行问题求解的关键思想和技巧,激发了学生利用简单模型来求解复杂问题的兴趣。

关键词:并行计算算法设计工作流教学改革A Teaching Approach to Parallel Algorithm Design based on Workflow ModelAbstract:Parallel algorithm design is one of the major courses in computer science.The challenging issue of teaching this course is how to translating the solution models into specific programming language.Classical teaching approaches mainly focus on teaching parallel programming language, which has several shortcomings based on existing experiences.In this paper,a workflow model based teachingapproach is proposed, which takes efforts into analyzing the parallel problem model and the differences between various models.In this way,the students can obtain the key ideology of parallel algorithm designing and corresponding techniques.In addition,the proposed teaching approach is aiming to making the algorithm designing independent of any programming language,which is of significant importance to inspire students solving complex parallel problems by simple models step by step.Key Words:Parallel Computing;Algorithm Design;Workflow;Teaching Reform高性能计算机是一个国家经济和科技实力的综合体现,也是促进经济、科技发展,社会进步和国防安全的重要基础。

算法导论第三版新增27章中文版

多线程算法（完整版）——算法导论第3版新增第27章Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein邓辉译原文：/sites/products/documentation/cilk/book_chapter.pdf本书中的主要算法都是顺序算法，适合于运行在每次只能执行一条指令的单处理器计算机上。

在本章中，我们要把算法模型转向并行算法，它们可以运行在能够同时执行多条指令的多处理器计算机中。

我们将着重探索优雅的动态多线程算法模型，该模型既有助于算法的设计和分析，同时也易于进行高效的实现。

并行计算机（就是具有多个处理单元的计算机）已经变得越来越常见，其在价格和性能方面差距甚大。

相对比较便宜的有片上多处理器桌面电脑和笔记本电脑，其中包含着一个多核集成芯片，容纳着多个处理“核”，每个核都是功能齐全的处理器，可以访问一个公共内存。

价格和性能都处于中间的是由多个独立计算机（通常都只是些 PC 级的电脑）组成的集群，通过专用的网络连接在一起。

价格最高的是超级计算机，它们常常采用定制的架构和网络以提供最高的性能（每秒执行的指令数）。

多处理器计算机已经以各种形态存在数十年了。

计算社团早在计算机科学形成的初期就选定采用随机存取的机器模型来进行串行计算，但是对于并行计算来说，却没有一个公认的模型。

这主要是因为供应商无法在并行计算机的架构模型上达成一致。

比如，有些并行计算机采用共享内存，其中每个处理器都可以直接访问内存的任何位置。

而有些并行计算机则使用分布式内存，每个处理器的内存都是私有的，要想去访问其他处理器的内存，必须得向其他处理器发送显式的消息。

不过，随着多核技术的出现，新的笔记本和桌面电脑目前都成为共享内存的并行计算机，趋势似乎倒向了共享内存多处理这边。

虽然一切还是得由时间来证明，不过我们在章中仍将采用共享内存的方法。

对于片上多处理器和其他共享内存并行计算机来说，使用静态线程是一种常见的编程方法，该方法是一种共享内存“虚拟处理器”或者线程的软件抽象。

3DEC-UDEC教程3

*not yet available
3DEC OPERATION
Command-driven operation versus Menu-driven operation
3DEC COMMAND SUMMARY 1. 2. 3. Specify Program Control Specify Special Calculation Modes Input Problem Geometry
Discontinuous medium modeled as an assemblage of polyhedral blocks; blocks may be rigid or deformable.
Statistically based joint-set generator and tunnel generator. Discontinuities treated as boundary conditions between blocks. Motion along discontinuities governed by linear and non-linear force-displacement relations for movements in both the normal and shear direction. Many built-in block and joint constitutive models that are representative of geologic, or similar, materials; optional user-written models.
- Menu-driven versus command-driven operation
- Simple tutorial 10:15-10:30 10:30-12:00 Break 3DEC Theoretical Background - DEM in three dimensions Practical Exercise - Failure of a jointed rock slope

《并行算法的设计与分析》

进程2 进程4进程5
17 30 45
USTC
2019/1/11
Y.Xu Copyright
5.1.4 异步枚举排序算法的时间分析
1.假定：第(1)步之前无任何进程启动；可在常数时间内解决读冲突；不考虑进程间的调度时间 2.MIMD-异步枚举排序算法时间 n个进程：每个进程时间O(n)
n n2 t ( n) O ( n) O ( ) p p p ( n) p c ( n) O ( n 2 )
Parallel Algorithms
Chapter 5 Sorting and Selecting in Asynchronous
2019/1/11
Y.Xu Copyright
USTC
Parallel Algorithms 1 / Ch5
主要内容

5.1 MIMD-CREW模型上的异步枚举排序算法
5.2.2 SIMD-CRCW上的快排序算法
2.SIMD-CRCW上的快排序二叉树构造算法
输入：A[1..n]到SM，n个处理器，并且A[i]保存在Pi的LM中
输出：二叉排序树root, Lc[1..n], Rc[1..n]在SM中 begin (1)for each Pi par-do (1.1)root=i (1.2)fi=root (1.3)Lci=Rci=n+1 end for (2)repeat for each Pi, i<>fi par-do if (Ai< Afi) or (Ai= Afi and i<fi) then //Ai是LM变量, Afi是SM变量; (Ai= Afi and i<fi)为了排序稳定 (2.1)Lcfi=i //Pi将i并发写入SM变量LCfi, 竞争为fi的左孩子 (2.2)if i=Lcfi then exit else fi= Lcfi end if else //Pi将i并发写入SM变量RCfi, 竞争为fi的右孩子 (2.4)if i=Rcfi then exit else fi= Rcfi end if //Pi将处理器号i并发写入SM变量root，root的值是不确定的 //Pi并发读入root到LM变量fi中 //Lci和Rci初始化，使得不指向任何处理器

计算机专业英语名词翻译

第一章（计算机系统概论）digital computer 数字计算机decimal digits 十进制数字binary 二进制bit 位ASCII 美国国家信息交换标准代码computer system 计算机系统hardware system 硬件系统software system 软件系统I/O devices 输入输出设备central processing unit(CPU) 中央处理器memory 存储器application software 应用软件video game 计算机游戏system software 系统软件register 寄存器floating point data浮点数据Boolean布尔值character data字符数据EBCDIC扩充的二十一进制交换代码punched cards穿孔卡片magnetic tape磁带main memory主存vacuum tubes电子管magnetic drum磁鼓transistors晶体管solid-state devices固体器件magnetic cores磁芯integrated circuit（IC）集成电路silicon chip硅芯片multiprogramming多道程序设计timessharing分时分时技术minicomputers小型计算机mainframe大型计算机large-scaleintegrated(LSI)大规模集成very-large-scale integrated(VLSI)超大规模集成word processing文字处理eletronic spreedsheets电子表格database management programs数据库管理程序desktop publishing桌面印刷personalcomputer(PC)个人计算机microcomputer微型计算机storage capacities存储容量stand-alone computer独立计算机local area network(LAN)局域网peripheral devices外部设备assembly line流水线supercomputer巨型计算机第二章（计算机系统结构）memmory subsystem存储子系统I/O subsystem输入输出子系统bus总线system bus系统总线chip 芯片address bus地址总线instructions指令memory location存储单元data bus数据总线control bus控制总线local bus 局部总线microprocessor微处理器register set寄存器组arithmetic logic unit(ALU)运算器clock cycle时钟周期control unit控制器computer architecture计算机体系结构introduction format指令格式addressing modes寻址方式introduction set指令集internal memory内存main memory主存Random Access Memory(RAM)随机存取存储器Read Only Memory (ROM)只读存储器secondary storage副主存储器vitual memory虚拟存储器Dynamic RAM(DRAM)动态存储器refresh circuitry刷新电路Static RAM(SRAM)静态RAMcache memory高速缓冲存储器masked ROM掩膜ROMPROM可编程RAMEPROM可擦写PROMultraviolet light紫外线EEPROM or EEPROM电擦写PROMbasic input/output system(BIOS)基本输入输出系统flash EEPROM 快闪存储器memory hierarchy 存储器体系结构auxiliary memory 辅助存储器storage memory 存储容量keyboard 键盘alphanumeric key字母数字键function key 功能键cursor key 光标键numeric keypad 数字键mouse 鼠标touch screen触屏infrared ray红外线monitor 监视器display screen显示屏laser printer激光打印机ink-jet printer喷墨打印机dot-matrix printer点针式打印机modem调制解调器input-output interface(I/O interface)输入输出接口peripheral外部设备，外设interrupt中断program counter程序计数器vectored interrupt向量中断nonvectored interrupt非向量中断interrupt vector中断向量Direct Memory Acess(DMA)直接存储器存取timeout超时第三单元（计算机体系结构）parallel processing 并行操作serial operations 串行操作instructions stream 指令流data dream 数据流SISD 单指令单数据流SIMD 单指令多数据流MISD 多指令单数据流MIMD 多指令多数据流pipeline processing 流水线处理combinational circuit 组合电路multiplier 乘法器adder 加法器clock pulse 时钟脉冲vector processing 向量处理one-dimensional array 一维数组scalar processer 标量处理器vector instructions 向量指令CISC 复杂指令集计算机decoder 译码器RISC 精简指令集计算机backward compatibility 向下兼容第四单元（算法与数据结构）algorithm 算法parallel algotithm 并行算法primitive 原语syntax 语法semantics 语义pseudocode 伪码exhaustive search 穷举搜索divide-and-conquer algorithm 分治算法dynamic programming 动态规划bottom-up 自上而下top-down 自下而上array 数组one-dimensional array 一维数组pointer 指针program counter 程序计数器instruction pointer 指令指针list 列表linked list 链表singly-linked list 单向链表double-linked list 双向链表circularly-linked list 循环链表FIFO 先进先出LIFO 后进先出stack 栈push 压栈pop 出栈stack pointer 栈指针queue 队列tree 树root 根level 层次degree of a node 结点的度depth of a tree树的深度binary tree 二叉树traversal 遍历M-way search tree M向搜索树第五章（编程语言）Program 程序Program language 程序设计语言Software engineering 软件工程Pseudocode 伪码Flowchart 流程图Coding 编码Program testing 程序测试Desk-checking 手工检查Documentation 文档User documentation 用户文档Operator documentation 操作员文档Programmer documentation 程序员文档Machine language 机器语言Assembly languages 汇编语言High-level languages 高级语言RAD(rapid application development) 快速应用开发Natural language 自然语言Artificial intelligence(AI) 人工智能Compile 编译Assemble 汇编Source code 源代码Object code 目标代码Linker 连接器Executable file 可执行文件Object-oriented programming 面向对象的程序设计Object 对象Class 类ADT（abstract data type）抽象数据类型Member variable 成员变量Class variable 类变量Member function 成员函数Inheritance 继承Derived class 派生类Overload 超载Message 消息Static binding 静态绑定Dynamic binding 动态绑定Polymorphism 多态性Visual programming 可视化编程Markup language 标记语言HTML（hyper text markup language）超文本标记语言Hyperlink 超链接XML(extensible markup language) 可扩展标记语言Java virtual machine java虚拟机第六章（操作系统）Application software 应用软件System software 系统软件Utility software 实用软件Operating system（OS）操作系统Shell 操作系统的外壳程序Graphical user interface（GUI）图形用户界面Kernel 内核Serial processing 串行处理Job 作业Batch processing 批处理Simple batch systems 简单批处理系统Multiprogrammed batch systems 多道程序批处理系统Monitor 监控程序Scheduler 调度程序Multiprogramming 多道程序Multitasking 多任务Time-sharing systems 分时系统Uniprogramming 单道进程Process 进程Process management 进程管理Process control block 进程控制块Mutual exclusion 互斥Multiprocessing 多处理，多进程Distributed processing 分布式管理Concurrent processes 并发处理Deadlock 死锁Synchronize process同步处理Semaphore 信号量Reusable resource 可复用性资源I/O buffers 输入/输出缓冲区I/O channel 输入/输出通道Deadlock prevention 死锁预防Deadlock detection 死锁检测Deadlock avoidance 死锁避免Virtual memory 虚拟内存Logical reference 逻辑引用Real addresse 实地址Paging 分页Segmentation 分段Virtual address 虚拟地址Physical addresses 物理地址Real-time process 实时处理File management 文件管理Plug and play(PnP) 即插即用第七单元(应用软件)application software 应用软件word processing 字处理软件spreadsheet 电子表格personal finance 个人理财presentation graphic 演示图形database manager 数据库管理软件groupware 群件desktop accessory 桌面辅助工具browsers 浏览区desktop publishing 桌面印刷project management 项目管理CAD 计算机辅助设计CAM 计算机辅助制造multimedia authoring 多媒体发布animation 动画MIDI 乐器数字化接口speech synthesis 语音合成insertion point 插入点scroll bar 滚动条window 窗口menu bar 菜单栏pull-down menu 下拉式菜单Button 按钮toolbar 工具条dialog box 对话框default value 缺省值（默认值）macro 宏OLE 对象链接和嵌入clipboard 剪切板column 列row 行cell 单元格cell address 单元格地址cell pointer 单元格指针formula 公式function 函数bar chart 柱形图line chart 线图pie chart 圆饼图workflow software 工作流软件PIM 个人信息管理软件Web browser 浏览器World Wide Web 万维网home page 主页第八单元（数据库）DBMS 数据库管理系统instance 实例schema 模式physical schema 物理模式存储模式内模式logical schema 逻辑模式概念模式模式subschema 子模式外模式data independence 数据独立性physical data independence 物理数据独立性logical data independence 逻辑数据独立性data model 数据模型entity-relationship model 实体联系模型object-oriented model 面向对象模型semantic data model 语义数据类型functional data model 功能数据模型entity 实体entity set 实体集mapping cardinality 映射基数abstract data type 抽象数据类型attribute 属性relation 关系tuple 元组primary key 主键super key 超健candidate key 候选键foreign key 外键DDL 数据定义语言data dictionary 数据字典DML 数据操纵语言procedure DML 过程化DML nonprocedure DML 非过程化DMLSQL 结构化查询语言view 视图the relational algebra 关系代数the tuple relational calculus 元组关系演算atomicity 原子性consistency 一致性duration 持久性transaction 事物DBA 数据库管理员。

毕业设计(论文)-空间3-rps并联机构的运动分析与仿真[管理资料]

毕业设计（论文）题目：空间3-RPS并联机构的运动分析与仿真题目类型：论文型学院：机电工程学院专业：机械工程及自动化年级：级学号：学生姓名：指导教师：日期： 2010-6-11摘要3-PRS并联机构是空间三自由度机构，该机构具有支链数目少、结构对称、驱动器易于布置、承载能力大、易于实现动平台大姿态角运动等特点，目前已在工程中得到成功应用。

本文基于空间机构学理论，对3-RPS并联机构进行了相关的运动学分析。

在对机构结构分析的基础上，对机构的输出位姿参数进行了解耦分析，得到了机构输出参数间的解耦关系式；用解析法推导了机构的位置反解方程；用数值法实现了机构的位置正解；依据驱动副行程、铰链转角、连杆尺寸干涉等限制因素确立约束条件，利用极限边界搜索算法搜索了3-PRS并联机构的工作空间，分析了该机构工作空间的特点，并进行了工作空间体积计算。

最后基于ADAMS软件平台，建立了3-RPS并联机构的三维实体简化模型，对3-RPS并联机构的运动进行了仿真。

本文的研究为3-RPS并联机构的结构设计与应用提供了参考。

关键词：3-PRS并联机构；位置正解；位置反解；工作空间；运动仿真ABSTRACT3-PRS parallel mechanism is a three degrees of freedom of space agencies, the agency has a small number of branched-chain, structural symmetry, the drive is easy layout, carrying capacity, easy to implement a large moving platform attitude angle motion and other characteristics, has been successfully applied in engineering . Based on the theory of space agencies, on the 3-RPS parallel mechanism was related to kinematics analysis. In the analysis of the structure, based on the position and orientation of the body of the output parameters of the decoupling analysis, the decoupling of the output parameters of the relationship; analytic method derived by inverse position equations institutions; achieved by numerical methods body forward position; based driver Vice trip, hinge angle, rod size interference and other constraints set constraints, using the limit boundary search algorithm for searching for the 3-PRS parallel mechanism of the working space, analysis of the sector space characteristics, and a working space of volume. Finally, based on ADAMS software platform, the establishment of the 3-RPS parallel mechanism of three-dimensional solid simplified model of 3-RPS parallel mechanism of the movement is simulated. This study for the 3-RPS parallel mechanism structure provides a reference design and application.Key word: 3-PRS parallel mechanism; forward position;inverse position;workspace ;motion simulation.目录摘要IIABSTRACT III前言VII第1章绪论1课题研究的意义 1并联机构简介 2并联机构的国内外发展现状 3少自由度机构介绍 6少自由度的研究意义 6少自由度并联机构的研究现状 (6)本文主要研究内容7第2章并联机构的组成原理及运动学分析 (9)引言9并联机构自由度分析9并联机构的组成原理10并联机构的研究内容11运动学分析11工作空间分析12本章小结13第3章3-PRS并联机构位置分析14引言14空间3-RPS并联机构14机构组成143-RPS并联平台机构的位姿描述 (15)3-RPS并联平台机构位姿解耦 (19)3-RPS并联平台机构的位姿反解203-RPS并联平台机构的位置正解23本章小结：25第4章3-RPS并联机构的工作空间分析 (26)引言263-RPS并联平台机构的工作空间分析 (26)机构的运动学约束263-RPS并联机构工作空间边界的确定 (28)工作空间分析算例29工作空间体积的计算方法29本章小结30第5章3-RPS并联机构的仿真与应用 313-RPS并联机构的的三维建模31ADAMS软件介绍313-RPS并联机构的建模313-RPS并联机构的运动仿真323-RPS并联机构的应用34本章小结37总结与体会38谢辞39参考文献40前言机构的发明与发展同人类的生产、生活息息相关，它促进着生产力的发展、生产工具的改进和人类生活水平的不断提高。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Communication Checklist
Communication operations balanced among tasks Each task communicates with only small Cohesion group of neighbors Tasks can perform communications concurrently Task can perform computations concurrently

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration Can Improve Performance
Eliminate communication between primitive tasks agglomerated into consolidated task Combine groups of sending and receiving tasks

The first two steps look for parallelism in the problem. However, the design obtained at this point probably doesn’t map well onto a real machine. If the number of tasks greatly exceed the number of processors, the overhead will be strongly affected by how the tasks are assigned to the processors. Now we have to decide what type of computer we are targeting
Task/Channel Model
Parallel computation = set of tasks Task Program Local memory Collection of I/O ports Tasks interact by sending messages througDividing computation and data into pieces Domain decomposition Divide data into pieces Determine how to associate computations with the data Functional decomposition Divide computation into pieces Determine how to associate data with the computations
Foster’s Design Methodology
Partitioning Communication Agglomeration Mapping

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Scalability issue

Assume we are manipulating a 3D matrix of size 8 x 128 x 256 and
17
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration Checklist

Locality of parallel algorithm has increased Replicated computations take less time than communications they replace Data replication doesn’t affect scalability Agglomerated tasks have similar computational and communications costs Number of tasks increases with problem size Number of tasks suitable for likely target systems Tradeoff between agglomeration and code modifications costs is reasonable
Example Functional Decomposition
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Partitioning Checklist
At least 10x more primitive tasks than processors in target computer Minimize redundant computations and redundant data storage Primitive tasks roughly the same size Number of tasks an increasing function of problem size

Our target machine is a centralized multiprocessor with 4 CPUs.

Suppose we agglomerate the 2nd and 3rd dimensions. Can we run on our target machine? Suppose we change to a target machine that is a centralized multiprocessor with 8 CPUs. What if we go to more than 8 CPUs?
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Parallel Programming
in C with MPI and OpenMP Michael J. Quinn
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Agglomeration(凝聚？)
Grouping tasks into larger tasks Goals Improve performance Maintain scalability of program Simplify programming In MPI programming, goal often to create one agglomerated task per processor
Foster’s Methodology
Problem Partitioning Communication
Mapping
Agglomeration
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

What We Have Hopefully at This Point – and What We Don’t Have
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 3
Parallel Algorithm Design
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Is it a centralized multiprocessor or a multicomputer? What communication paths are supported How must we combine tasks in order to map them effectively onto processors?
Outline
Task/channel model Algorithm design methodology Case studies

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.