Optimization Techniques for Parallel Codes of Irregular Scientific Computations

合集下载

CUDA Toolkit Documentation v7

CUDA Toolkit Documentation v7.5Release NotesThe Release Notes for the CUDA Toolkit.EULAThe End User License Agreements for the NVIDIA CUDA Toolkit, the NVIDIA CUDA Samples, the NVIDIA Display Driver, and NVIDIA NSight (Visual Studio Edition).Installation GuidesQuick Start GuideThis guide provides the minimal first-steps instructions forinstallation and verifying CUDA on a standard system. Installation Guide WindowsThis guide discusses how to install and check for correct operation of the CUDA Development Tools on Microsoft Windows systems. Installation Guide Mac OS XThis guide discusses how to install and check for correct operation of the CUDA Development Tools on Mac OS X systems.Installation Guide LinuxThis guide discusses how to install and check for correct operation of the CUDA Development Tools on GNU/Linux systems.Programming GuidesProgramming GuideThis guide provides a detailed discussion of the CUDA programming model and programming interface. It then describes the hardware implementation, and provides guidance on how to achieve maximum performance. The appendices include a list of all CUDA-enableddevices, detailed description of all extensions to the C language, listings of supported mathematical functions, C++ featuressupported in host and device code, details on texture fetching, technical specifications of various devices, and concludes byintroducing the low-level driver API.这份指导手册为对CUDA的程序模型和接口进行了详细的讨论。

GRID Systems

Optimization Techniques for Implementing Parallel Skeletonsin Distributed EnvironmentsM.Aldinucci,M.Danelutto {aldinuc,marcod }@di.unipi.it UNIPI Dept.of Computer Science –University of Pisa Largo B.Pontecorvo 3,Pisa,Italy J.D¨u nnweber,S.Gorlatch {duennweb,gorlatch }@math.uni-muenster.de WWU MuensterDept.of Computer Science –University of M¨u nsterEinsteinstr.62,M¨u nster,Germany CoreGRID Technical Report Number TR-0001January 21,2005Institute on Programming ModelCoreGRID -Network of ExcellenceURL:Legacy Code Support for Production GridsT.Kiss*,G.Terstyanszky*,G.Kecskemeti*,Sz.Illes*,T.Delaittre*,S.Winter*,P .Kacsuk**,G.Sipos***Centre of Parallel Computing,University of Westminster,115New Cavendish Street,London W1W 6UW United Kingdom e-mail:gemlca-discuss@ **MTA SZTAKI1111Kende u.13Budapest,Hungary CoreGRID Technical Report Number TR-00116th June 2005Institute on Problem Solving Environment,Tools and GRID Systems CoreGRID -Network of ExcellenceURL: CoreGRID is a Network of Excellence funded by the European Commission under the Sixth Framework ProgrammeProject no.FP6-004265Legacy Code Support for Production GridsT.Kiss*,G.Terstyanszky*,G.Kecskemeti*,Sz.Illes*,T.Delaittre*,S.Winter*,P.Kacsuk**,G.Sipos***Centre of Parallel Computing,University of Westminster,115New Cavendish Street,London W1W6UW United Kingdome-mail:gemlca-discuss@**MTA SZTAKI1111Kende u.13Budapest,HungaryCoreGRID TR-00116th June2005AbstractIn order to improve reliability and to deal with the high complexity of existing middleware solutions,todays production Grid systems restrict the services to be deployed on their resources.On the other hand end-users requirea wide range of value added services to fully utilize these resources.This paper describes a solution how legacy codesupport is offered as third party service for production Grids.The introduced solution,based on the Grid ExecutionManagement for Legacy Code Architecture(GEMLCA),do not require the deployment of additional applications onthe Grid resources,or any extra effort from Grid system administrators.The implemented solution was successfullyconnected to and demonstrated on the UK National Grid Service.1IntroductionThe vision of Grid computing is to enable anyone to offer resources to be utilised by others via the network.This orig-inal aim,however,has not been fulﬁlled so far.Todays production Grid systems,like the EGEE Grid,the NorduGrid or the UK National Grid Service(NGS)apply very strict rules towards service providers,hence restricting the number of sites and resources in the Grid.The reason for this is the very high complexity to install and maintain existing Grid middleware solutions.In a production Grid environment strong guarantees are needed that system administrators keep the resources up and running.In order to offer reliable service only a limited range of software is allowed to be deployed on the resources.On the other hand these Grid systems aim to serve a large and diverse user community with different needs and goals.These users require a wide range of tools in order to make it easier to create and run Grid-enabled applications. As system administrators are reluctant to install any software on the production Grid that could compromise reliability, the only way to make these utilities available for users is to offer them as third-party services.These services are running on external resources,maintained by external organisations,and they are not integral part of the production Grid system.However,users can freely select and utilise these additional services based on their requirements and experience with the service.This previously described scenario was utilised to connect GEMLCA(Grid Execution Management for Legacy Code Architecture)[11]to the UK National Grid Service.GEMLCA enables legacy code programs written in any source language(Fortran,C,Java,etc.)to be easily deployed as a Grid Service without signiﬁcant user effort.A This research work is carried out under the FP6Network of Excellence CoreGRID funded by the European Commission(Contract IST-2002-004265).user-level understanding,describing the necessary input and output parameters and environmental values such as the number of processors or the job manager used,is all that is required to port the legacy application binary onto the Grid.GEMLCA does not require any modiﬁcation of,or even access to,the original source code.The architecture is also integrated with the P-GRADE portal and workﬂow[13]solutions to offer a user friendly interface,and create complex applications including legacy and non-legacy components.In order to connect GEMLCA to the NGS two main tasks have been completed:•First,a portal server has been set up at University of Westminster running the P-GRADE Grid portal and offering access to the NGS resources for authenticated and authorised users.With the help of their Grid certiﬁcates and NGS accounts portal users can utilise NGS resources in a much more convenient and user-friendly way than previously.•Second,the GEMLCA architecture has been redesigned in order to support the third-party service provider scenario.There is no need to install GEMLCA on any NGS resource.The architecture is deployed centrally on the portal server but still offers the same legacy code functionally as the original solution:users can easily deploy legacy applications as Grid services,can access these services from the portal interface,and can create, execute and visualise complex Grid workﬂows.This paper describes two different scenarios how GEMLCA is redesigned in order to support a production Grid system.Theﬁrst scenario supports the traditional job submission like task execution,and the second offers the legacy codes as pre-deployed services on the appropriate resources.In both cases GEMLCA runs on an external server, and neither compromise the reliability of the production Grid system,nor requires extra effort from the Grid system administrators.The service is transparent from the Grid operators point of view but offers essential functionality for the end-users.2The UK National Grid ServiceThe National Grid Service(NGS)is the UK production Grid operated by the Grid Operation Support Centre(GOSC). It offers a stable highly-available production quality Grid service to the UK research community providing compute and storage resources for users.The core NGS infrastructure consists of four cluster nodes at Cambridge,CCLRC-RAL,Leeds and Manchester,and two national High Performance Computing(HPC)services:HPCx and CSAR.NGS provides compute resources for compute Grid through compute clusters at Leeds and Oxford,and storage resources for data Grid through data clusters at CCLRC-RAL and Manchester.This core NGS infrastructure has recently been extended with two further Grid nodes at Bristol and Cardiff,and will be further extended by incorporating UK e-Science Centres through separate Service Level Agreements(SLA).NGS is based on GT2middleware.Its security is built on Globus Grid Security Infrastructure(GSI)[14],which supports authentication,authorization and single sign-on.NGS uses GridFTP to transfer input and outputﬁles to and from nodes,and Storage Resource Broker(RSB)[6]with OGSA-DAI[3]to provide access to data on NGS nodes. It uses the Globus Monitoring and Discovery Service(MDS)[7]to handle information of NGS nodes.Ganglia[12], Grid Integration Test Script(GITS)[4]and Nagios[2]are used to monitor both the NGS and its nodes.Nagios checks nodes and services while GITS monitors communication among NGS nodes.Ganglia collects and processes information provided by Nagios and GITS in order to generate NGS-level view.NGS uses a centralised user registration ers have to obtain certiﬁcates and open accounts to be able to use any NGS service.The certiﬁcates are issued by the UK Core Programme Certiﬁcation Authority(e-Science certiﬁcate)or by other CAs.NGS accounts are allocated from a central pool of generic user accounts to enable users to register with all NGS nodes at the same er management is based on Virtual Organisation Membership Service (VOMS)[1].VOMS supports central management of user registration and authorisation taking into consideration local policies on resource access and usage.3Grid Execution Management for Legacy Code ArchitectureThe Grid computing environment requires special Grid enabled applications capable of utilising the underlying Grid middleware and infrastructure.Most Grid projects so far have either developed new applications from scratch,orsigniﬁcantly re-engineered existing ones in order to be run on their platforms.However,as the Grid becomes com-monplace in both scientiﬁc and industrial settings,the demand for porting a vast legacy of applications onto the new platform will panies and institutions can ill afford to throw such applications away for the sake of a new technology,and there is a clear business imperative for them to be migrated onto the Grid with the least possible effort and cost.The Grid Execution Management for Legacy Code Architecture(GEMLCA)enables legacy code programs written in any source language(Fortran,C,Java,etc.)to be easily deployed as a Grid Service without signiﬁcant user effort.In this chapter the original GEMLCA architecture is outlined.This architecture has been modiﬁed,as described in chapters4and5,in order to create a centralised version for production Grids.GEMLCA represents a general architecture for deploying legacy applications as Grid services without re-engineering the code or even requiring access to the sourceﬁles.The high-level GEMLCA conceptual architecture is represented on Figure1.As shown in theﬁgure,there are four basic components in the architecture:1.The Compute Server is a single or multiple processor computing system on which several legacy codes arealready implemented and available.The goal of GEMLCA is to turn these legacy codes into Grid services that can be accessed by Grid users.2.The Grid Host Environment implements a service-oriented OGSA-based Grid layer,such as GT3or GT4.Thislayer is a pre-requisite for connecting the Compute Server into an OGSA-built Grid.3.The GEMLCA Resource layer provides a set of Grid services which expose legacy codes as Grid services.4.The fourth component is the GEMLCA Client that can be installed on any client machine through which a userwould like to access the GEMLCA resources.Figure1:GEMLCA Conceptual ArchitectureThe novelty of the GEMLCA concept compared to other similar solutions like[10]or[5]is that it requires minimal effort from both Compute Server administrators and end-users of the Grid.The Compute Server administrator should install the GEMLCA Resource layer on top of an available OGSA layer(GT3/GT4).It is also their task to deploy existing legacy applications on the Compute Servers as Grid services,and to make them accessible for the whole Grid community.End-users do not have to do any installation or deployment work if a GEMLCA portal is available for the Grid and they only need those legacy code services that were previously deployed by the Compute Server administrators.In such a case end-users can immediately use all these legacy code services-provided they have access to the GEMLCA Grid resources.If they would like to deploy legacy code services on GEMLCA Grid resources they can do so,but these services cannot be accessed by other Grid users.As a last resort,if no GEMLCA portal is available for the Grid,a user must install the GEMLCA Client on their client machine.However,since it requires some IT skills to do this,it is recommended that a GEMLCA portal is installed on every Grid where GEMLCA Grid resources are deployed.The deployment of a GEMLCA legacy code service assumes that the legacy application runs in its native environ-ment on a Compute Server.It is the task of the GEMLCA Resource layer to present the legacy application as a Grid service to the user,to communicate with the Grid client and to hide the legacy nature of the application.The deploy-ment process of a GEMLCA legacy code service requires only a user-level understanding of the legacy application, i.e.,to know what the parameters of the legacy code are and what kind of environment is needed to run the code(e.g. multiprocessor environment with n processors).The deployment deﬁnes the execution environment and the parameter set for the legacy application in an XML-based Legacy Code Interface Description(LCID)ﬁle that should be stored in a pre-deﬁned location.Thisﬁle is used by the GEMLCA Resource layer to handle the legacy application as a Grid service.GEMLCA provides the capability to convert legacy codes into Grid services just by describing the legacy param-eters and environment values in the XML-based LCIDﬁle.However,an end-user without specialist computing skills still requires a user-friendly Web interface(portal)to access the GEMLCA functionalities:to deploy,execute and retrieve results from legacy applications.Instead of developing a new custom Grid portal,GEMLCA was integrated with the workﬂow-oriented P-GRADE Grid portal extending its functionalities with new portlets.Following this integration,end-users can easily construct workﬂow applications built from legacy code services running on different GEMLCA Grid resources.The workﬂow manager of the portal contacts the selected GEMLCA Resources,passes them the actual parameter values of the legacy code,and then it is the task of the GEMLCA Resource to execute the legacy code with the actual parameter values.The other important task of the GEMLCA Resource is to deliver the results of the legacy code service back to the portal.The overall structure of the GEMLCA Grid with the Grid portal is shown in Figure2.Figure2:GEMLCA with Grid Portal4Connecting GEMLCA to the NGSTwo different scenarios were identiﬁed in order to execute legacy code applications on NGS sites.In each scenario both GEMLCA and the P-GRADE portal are installed on the Parsifal cluster of the University of Westminster.As a result,there is no need to deploy any GEMLCA or P-GRADE portal code on the NGS resources.scenario1:legacy codes are stored in a central repository and GEMLCA submits these codes as jobs to NGS sites, scenario2:legacy codes are installed on NGS sites and executed through GEMLCA.The two scenarios are supporting different user needs,and each of them increases the usability of the NGS in different ways for end-users.The GEMLCA research team has implemented theﬁrst scenario in May2005,and currently working on the implementation of the second scenario.This chapter brieﬂy describes these two different scenarios,and the next chapter explains in detail the design and implementation aspects of theﬁrst already implemented solution.As the design and implementation of the second scenario is currently work is progress,its detailed description will be the subject of a future publication.4.1Scenario1:Legacy Code Repository for NGSThere are several legacy applications that would be useful for users within the NGS community.These applications were developed by different institutions and currently not available for other members of the community.According to this scenario legacy codes can be uploaded into a central repository and made available for authorised users through a Grid portal.The solution extends the usability of NGS as users can submit not only their own applications but can also utilise other legacy codes stored in the repository.Users can access the central repository,managed by GEMLCA,through the P-GRADE portal and upload their applications into this repository.After uploading legacy applications users with valid certiﬁcates and existing NGS accounts can select and execute legacy codes through the P-GRADE portal on different NGS sites.In this scenario the binary codes of legacy applications are transferred from the GEMLCA server to the NGS sites,and executed as jobs.Figure3:Scenario1-Legacy Code Repository for NGS4.2Scenario2:Pre-deployed Legacy Code ServicesThis solution extends the NGS Grid towards the service-oriented Grid ers cannot only submit and execute jobs on the resources but can also access legacy applications deployed on NGS and include these in their workﬂows. This scenario is the logical extension of the original GEMLCA concept in order to use it with NGS.In this scenario the legacy codes are already deployed on the NGS sites and only parameters(input or output)are submitted.Users contact the central GEMLCA resource through the P-GRADE portal,and can access the legacy codes that are deployed on the NGS sites.In this scenario the NGS system administrators have full control of legacy codes that they deploy on their own resources.Figure4:Scenario2Pre-Deployed Legacy Code on NGS Sites5Legacy Code Repository for the NGS5.1Design objectivesThe currently implemented solution that enables users to deploy,browse and execute legacy code applications on the NGS sites is based on scenario1,as described in the previous chapter.This solution utilises the original GEMLCA architecture with the necessary modiﬁcations in order to execute the tasks on the NGS resources.The primary aims of the solution are the following:•The owners of legacy applications can publish their codes in the central repository making it available for other authorised users within the UK e-Science community.The publication is not different from the original method used in GEMLCA,and it is supported by the administration Grid portlet of the P-GRADE portal,as described in[9].After publication the code is available for other non-computer specialist end-users.•Authorised users can browse the repository,select the necessary legacy codes,set their input parameters,and can even create workﬂows from compatible components.These workﬂows can then be mapped onto the NGS resources,submitted and the execution visualised.•The deployment of a new legacy application requires some high level understanding of the code(like name and types input and output parameters)and its execution environment(e.g.supported job managers,maximum number of processors).However,once the code is deployed end-users with no Grid speciﬁc knowledge can easily execute it,and analyse the results using the portal interface.As GEMLCA is integrated with the P-GRADE Grid portal,NGS users have two different options in order to execute their applications.They can submit their own code directly,without the described publication process,using the original facilities offered by the portal.This solution is suggested if the execution is only on an ad-hoc basis when the publication puts too much overhead on the process.However,if they would like to make their code available for a larger community,and would like make the execution simple enough for any end-user,they can publish the code with GEMLCA in the repository.In order to execute a legacy code on an NGS site,users should have a valid user certiﬁcate,for example an e-Science certiﬁcate,an NGS account and also an account for the P-GRADE portal running at Westminster.After logging in the portal they download their user certiﬁcate from an appropriate myProxy server.The legacy code,submitted to the NGS site,utilise this certiﬁcate to authenticate users.Portal Legacy code deployer Legacy code executor Placeand input Portal Legacy code deployer Legacy code executor script”filesOriginal GEMLCAGEMLCA NGS Figure 5:Comparison of the Original and the NGS GEMLCA Concept5.2Implementation of the SolutionTo fulﬁl these objectives some modiﬁcations and extensions of the original GEMLCA architecture were necessary.Figure 5compares the original and the extended GEMLCA architectures.As it is shown in the ﬁgure,an additional layer,representing the remote NGS resource where the code is executed,appears.The deployment of a legacy code is not different from the original GEMLCA concept;however,the execution has changed signiﬁcantly in the NGS version.To transfer the executable and the input parameters to the NGS site,and to instruct the remote GT2GRAM to execute the jobs,required the modiﬁcation of the GEMLCA architecture,including the development of a special script that interfaces with Condor G.The major challenge when connecting GEMLCA to the NGS was that NGS sites use Globus Toolkit version 2(GT2),however the current GEMLCA implementations are based on service-oriented Grid middleware,namely GT3and GT4.The interfacing between the different middleware platforms is supported by a script,called NGS script,that provides additional functionality required for executing legacy codes on NGS sites.Legacy codes and input ﬁles are stored in the central repository but executed on the remote NGS sites.To execute the code on a remote site ﬁrst the NGS script,executed as a GEMLCA legacy code,instructs the portal to copy the binary and input ﬁles from the central repository to the NGS site.Next,the NGS script,using Condor-G,submits the legacy code as a job to the remote site.The other major part of the architecture where modiﬁcations were required is the conﬁg.xml ﬁle and its relatedJava classes.GEMLCA uses an XML-based descriptionﬁle,called conﬁg.xml,in order to describe the environmental parameters of the legacy code.Thisﬁle had to be extended and modiﬁed in order to take into consideration a second-level job manager,namely the job manager used on the remote NGS site.The conﬁg.xml should also notify the GEMLCA resource that it has to submit the NGS script instead of a legacy code to GT4MMJFS(Master Managed Job Factory Service)when the user wants to execute the code on an NGS site.The implementation of these changes also required the modiﬁcation of the GEMLCA core layer.In order to utilise the new GEMLCA NGS solution:1.The owner of the legacy application deploys the code as a GEMLCA legacy code in the central repository.2.The end-user selects and executes the appropriate legacy applications on the NGS sites.As the deployment process is virtually identical to the one used by the original GEMLCA solution here we con-centrate on the second step,the code execution.The following steps are performed by GEMLCA when executing a legacy code on the NGS sites(Fig.6):1.The user selects the appropriate legacy codes from the portal,deﬁnes inputﬁles and parameters,and submits an“execute a legacy code on an NGS site”request.2.The GEMLCA portal transfers the inputﬁles to the NGS site.3.The GEMLCA portal forwards the users request to a GEMLCA Resource.4.The GEMLCA resource creates and submits the NGS script as a GEMLCA job to the MMJFS.5.The MMJFS starts the NGS script.6.Condor-G contacts the remote GT2GRAM,sends the binary of the legacy code and its parameters to the NGSsite,and submits the legacy code as a job to the NGS site job manager.Figure6:Execution of Legacy Codes on an NGS SiteWhen the job has been completed on the NGS site the results are transferred from the NGS site to the user in the same way.6Results-Trafﬁc simulation on the NGSA working prototype of the described solution has been implemented and tested creating and executing a trafﬁc simu-lation workﬂow on the different NGS resources.The workﬂow consists of three types of components:1.The Manhattan legacy code is an application to generate inputs for the MadCity simulator:a road networkﬁleand a turnﬁle.The MadCity road networkﬁle is a sequence of numbers,representing a road topology of a road network.The MadCity turnﬁle describes the junction manoeuvres available in a given road network.Trafﬁc light details are also included in thisﬁle.2.MadCity[8]is a discrete-time microscopic trafﬁc simulator that simulates trafﬁc on a road network at the levelof individual vehicles behaviour on roads and at junctions.After completing the simulation,a macroscopic trace ﬁle,representing the total dynamic behaviour of vehicles throughout the simulation run,is created.3.Finally a trafﬁc density analyser compares the trafﬁc congestion of several runs of the simulator on a givennetwork,with different initial road trafﬁc conditions speciﬁed as input parameters.The component presents the results of the analysis graphically.Each of these applications was published in the central repository at Westminster as GEMLCA legacy code.The publication was done using the administration portlet of the GEMLCA P-GRADE portal.During this process the type of input and output parameters,and environmental values,like job managers and maximum number of processors used for parallel execution,were set.Once published the codes are ready to be used by end-users even with very limited computing knowledge.Figure7shows the workﬂow graph and the execution of the different components on NGS resources:•Job0is a road network generator mapped at Leeds,•jobs1and2are trafﬁc simulators running parallel at Leeds and Oxford,respectively,•ﬁnally,job3is a trafﬁc density analyser executed at Leeds.Figure7:Workﬂow Graph and Visualisation of its Execution on NGS ResourcesWhen creating the workﬂow the end-user selected the appropriate application from the repository,set input param-eters and mapped the execution to the available NGS resources.During execution the NGS script run,contacted the remote GT2GRAM,and instructed the portal to pass executa-bles and input parameters to the remote site.Whenﬁnishing the execution the outputﬁles were transferred back to Westminster and were made available for the user.7Conclusion and Future WorkThe implemented solution successfully demonstrated that additional services,like legacy code support,run and main-tained by third party service providers can be added to production Grid systems.The major advantage of this solution is that the reliability of the core Grid infrastructure is not compromised,and no additional effort is required from Grid system administrators.On the other hand,utilizing these services the usability of these Grids can be signiﬁcantly improved.Utilising and re-engineering the GEMLCA legacy code solution two different scenarios were identiﬁed to provide legacy code support for the UK NGS.Theﬁrst,providing a legacy code repository functionality,and allowing the submission of legacy applications as jobs to NGS resources was successfully implemented and demonstrated.The ﬁnal production version of this architecture and its ofﬁcial release for NGS users is scheduled for June2005.The second scenario,that extends the NGS with pre-deployed legacy code services,is currently in the design phase.Challenges are identiﬁed concerning its implementation,especially the creation and management of virtual organizations that could utilize these pre-deployed services.References[1]R.Alﬁeri,R.Cecchini,V.Ciaschini,L.dellAgnello,A.Frohner,A.Gianoli,K.Lorentey,,and F.Spata.V oms,an authorization system for virtual organizations.af.infn.it/voms/VOMS-Santiago.pdf. [2]S.Andreozzi,S.Fantinel,D.Rebatto,L.Vaccarossa,and G.Tortone.A monitoring tool for a grid operationcenter.In CHEP2003,La Jolla,California,March2003.[3]Mario Antonioletti,Malcolm Atkinson,Rob Baxter,Andrew Borley,Neil P Chue,Hong,Brian Collins,NeilHardman,Ally Hume,Alan Knox,Mike Jackson,Amy Krause,Simon Laws,James Magowan,Norman W Paton,Dave Pearson,Tom Sugden,Paul Watson,and Martin Westhead.The design and implementation of grid database services in ogsa-dai.Concurrency and Computation:Practice and Experience,17:357–376,2005. [4]David Baker,Mark Baker,Hong Ong,and Helen Xiang.Integration and operational monitoring tools for theemerging uk e-science grid infrastructure.In Proceedings of the UK e-Science All Hands Meeting(AHM2004), East Midlands Conference Centre,Nottingham,2004.[5]B.Balis,M.Bubak,and M.Wegiel.A solution for adapting legacy code as web services.In V.Getov andT.Kiellmann,editors,Component Models and Systems for Grid Applications,pages57–75.Springer,2005.ISBN0-387-23351-2.[6]C.Baru,Moore R,A.Rajasekar,and M.Wan.The sdsc storage resource broker.In Proc.CASCON’98Confer-ence,November1998.[7]Karl Czajkowski,Steven Fitzgerald,Ian Foster,and Carl Kesselman.Grid information services for distributedresource sharing.In Proceedings of the Tenth IEEE International Symposium on High-Performance Distributed Computing(HPDC-10),/research/papers/MDS-HPDC.pdf.[8]A.Gourgoulis,G.Terstyansky,P.Kacsuk,and S.C.Winter.Creating scalable trafﬁc simulation on clusters.In PDP2004.Conference Proceedings of the12th Euromicro Conference on Parallel,Distributed and Network based Processing,La Coruna,Spain,February2004.[9]A.Goyeneche,T.Kiss,G.Terstyanszky,G.Kecskemeti,T.Delaitre,P.Kacsuk,and S.C.Winter.Experienceswith deploying legacy code applications as grid services using gemlca.In P.M.A.Sloot,A.G.Hoekstra,T.Priol,。

Optimization Algorithms

Optimization AlgorithmsOptimization algorithms are a crucial tool in various fields, from engineering and finance to healthcare and logistics. These algorithms aim to find the best solution to a given problem by iteratively improving a candidate solution. One of the most well-known optimization algorithms is the genetic algorithm, inspired by the process of natural selection. Genetic algorithms work by creating a population of candidate solutions, evaluating their fitness, selecting the best individuals, and applying genetic operators such as mutation and crossover to generate new solutions. This process is repeated over multiple generations until a satisfactory solution is found. Another popular optimization algorithm is the particle swarm optimization (PSO) algorithm, which is inspired by the social behavior of bird flocks or fish schools. In PSO, a population of particles moves through the search space, adjusting their positions based on their own best solution and the best solution found by the group. This collaborative approach allows PSO to efficiently explore the search space and converge to a good solution. Optimization algorithms can be applied to a wide range of problems, such as optimizing the design of a mechanical structure, finding the best route for a delivery truck, or tuning the parameters of a machine learning model. These algorithms can handle complex, high-dimensional problems that are difficult to solve using traditional methods. By exploring a large search space and iteratively improving candidate solutions, optimization algorithms can find solutions that are close to the global optimum, even in the presence of noise or uncertainty. However, optimization algorithms are not without their limitations. One common challenge is the risk of getting stuck in local optima, where the algorithm converges to a suboptimal solution instead of the global optimum. To mitigate this risk, researchers have developed techniques such as multi-start optimization, which involves running the algorithm multiple times with different starting points, or incorporating randomization to encourage exploration of the search space. Another limitation of optimization algorithms is their computational cost, especially for problems with a large number of variables or constraints. As the search space grows, the algorithm may require more iterations to find a good solution, leading to longer runtimes and higher computational resources. Researchers are constantly working on developingmore efficient algorithms, such as parallel optimization techniques or hybrid algorithms that combine different optimization methods to improve performance. Despite these challenges, optimization algorithms have proven to be valuable tools for solving complex problems in various domains. By harnessing the power of evolution, swarm intelligence, or mathematical optimization techniques, these algorithms can find solutions that are not easily achievable through manual trial and error. As technology advances and computational resources become more powerful, optimization algorithms will continue to play a crucial role in shaping the future of science, engineering, and innovation.。

集成电路专业课程 fpga

集成电路专业课程 fpgaThe field-programmable gate array (FPGA) is a crucial course in the curriculum of integrated circuit engineering. As technology continues to advance, the demand for skilled professionals in this field is increasing. This course provides students with a comprehensive understanding of FPGA design, implementation, and application. In this essay, we will explore the importance of FPGA courses frommultiple perspectives, including their relevance in industry, their impact on students' skill development, and their contribution to technological advancements.Firstly, FPGA courses are highly relevant in theindustry due to the widespread use of FPGAs in various applications. FPGAs offer flexible hardware design capabilities and are widely used in fields such as telecommunications, automotive, aerospace, and consumer electronics. By studying FPGA courses, students gain the knowledge and skills required to design and implement complex digital systems using FPGAs. This enables them tomeet the industry's demand for professionals who candevelop innovative solutions using FPGA technology.Secondly, FPGA courses play a vital role in developing students' skills and competencies. These courses provide a practical understanding of FPGA architectures, digital design techniques, and hardware description languages like VHDL or Verilog. Through hands-on lab sessions and projects, students learn how to design, simulate, and implementdigital circuits on FPGAs. This practical experience enhances their problem-solving abilities, critical thinking skills, and their ability to work with real-world constraints. Additionally, FPGA courses often involve teamwork, helping students develop collaboration and communication skills, which are essential in theprofessional world.Furthermore, FPGA courses contribute significantly to technological advancements. As FPGAs become more powerful and versatile, they are increasingly used in cutting-edge technologies such as artificial intelligence, machine learning, and high-performance computing. By studying FPGAcourses, students become equipped with the knowledge and skills necessary to contribute to these technological advancements. They learn about advanced FPGA architectures, optimization techniques, and parallel processing, enabling them to design and implement complex systems that push the boundaries of what is possible.In addition to the technical aspects, FPGA courses also foster creativity and innovation. Students are encouraged to explore novel design approaches and develop unique solutions to real-world problems. This fosters an entrepreneurial mindset and prepares students to become future innovators in the field of integrated circuit engineering. By encouraging students to think outside the box and experiment with different design methodologies, FPGA courses cultivate a culture of innovation and drive advancements in the field.Moreover, FPGA courses provide students with a solid foundation for further academic research or pursuing advanced degrees. The knowledge gained in FPGA courses forms the basis for more specialized research in the fieldof integrated circuit engineering. Students who excel in FPGA courses may choose to delve deeper into specific FPGA architectures, optimization techniques, or explore emerging trends in the field. This research can lead to breakthroughs in FPGA technology and contribute to the academic community's understanding of digital design and implementation.In conclusion, FPGA courses are of utmost importance in the curriculum of integrated circuit engineering. They are highly relevant in the industry, contribute to students' skill development, drive technological advancements, foster creativity and innovation, and provide a foundation for further academic research. By studying FPGA courses, students acquire the knowledge and skills necessary to design and implement complex digital systems using FPGAs, making them valuable assets in the ever-evolving field of integrated circuit engineering.。

优化设计实验报告实验总结

优化设计实验报告实验总结1. 引言本次实验的目的是通过优化设计的方法，提高软件系统的性能和效率。

本文将对实验过程中所进行的优化设计以及效果进行总结和分析。

2. 实验内容2.1 实验背景本次实验使用了一个实验平台，该平台是一个高并发的网络爬虫系统。

系统的任务是从互联网上下载数据并进行处理。

由于任务的复杂性，系统在处理大量数据时会出现性能瓶颈。

2.2 实验方法为了提高系统的性能和效率，我们采取了以下优化设计方法：1. 并行化处理：将系统的任务分解为多个子任务，并使用多线程或分布式处理这些子任务，从而提高系统的并发能力和处理效率。

2. 缓存优化：针对系统中频繁读写的数据，使用缓存技术进行优化，减少对数据库和磁盘的访问，提高数据读写的速度。

3. 算法优化：针对系统中的关键算法进行优化，通过改进算法的实现方式、减少算法的时间和空间复杂度等方式，提高算法的执行效率。

4. 资源管理优化：通过合理管理系统的资源，如内存、网络等，避免资源的浪费和瓶颈，提高系统的整体性能。

2.3 实验过程我们首先对系统进行了性能测试，找出了系统存在的性能瓶颈。

然后，针对这些性能瓶颈，我们参考已有的优化设计方法，并结合我们的实际情况，进行了相应的优化设计。

最后，我们在实验平台上对优化后的系统进行了性能测试，评估了优化的效果。

3. 实验结果与分析经过优化设计后，系统的性能得到了明显提升。

在并行化处理方面，通过使用多线程和分布式处理，系统的并发能力得到了大幅提升，处理能力得到了有效利用。

在缓存优化方面，我们合理使用了缓存技术，减少了对数据库和磁盘的访问次数，提高了数据读写的速度。

在算法优化方面，我们通过改进算法的实现方式，使得算法的执行效率得到了明显提升。

在资源管理优化方面，我们对系统的资源进行了合理管理，避免了资源的浪费和瓶颈。

经过实验对比测试，我们发现，经过优化设计后的系统的性能较之前有了明显的提升。

系统的处理能力得到了有效利用，并发能力得到了大幅提升，整体的性能和效率明显提高。

优化方案英文

Optimization PlanIntroductionIn today’s fast-paced world, optimization is the key to success. Whether it is a business process, software application, or personal productivity, implementing optimization strategies can greatly enhance efficiency, productivity, and overall performance. This document aims to provide an overview of various optimization techniques and suggest an optimization plan that can be applied to different scenarios.1. Understanding OptimizationOptimization is the process of making something as effective or efficient as possible. It involves analyzing the current state, identifying areas for improvement, and implementing strategies to achieve the desired outcome. Optimization can be applied to various domains, including business operations, software development, marketing, and personal productivity.2. Benefits of OptimizationImplementing optimization strategies can bring several advantages to organizations or individuals:2.1 Improved EfficiencyOptimization allows for streamlining processes and eliminating unnecessary steps or redundancies. By doing so, resources and time can be utilized more effectively, leading to improved efficiency.2.2 Increased ProductivityOptimized workflows and systems result in increased productivity. By removing bottlenecks, automating tasks, and utilizing available resources efficiently, more work can be accomplished in less time.2.3 Cost ReductionEfficiency gains and increased productivity often lead to cost reduction. By optimizing processes, organizations can save on resources, manpower, and operational expenses.2.4 Enhanced PerformanceOptimization can significantly enhance the performance of systems, applications, or individuals. By identifying and addressing weaknesses or obstacles, overall performance can be improved.3. Optimization TechniquesThere are various optimization techniques that can be applied depending on the specific scenario. Some popular techniques include:3.1 Process OptimizationProcess optimization involves analyzing and improving business processes to maximize efficiency and productivity. Techniques such as Lean Six Sigma, value stream mapping, and continuous improvement methodologies can be utilized to identify areas for improvement and implement changes.3.2 Algorithm OptimizationAlgorithm optimization focuses on improving the efficiency, speed, or accuracy of algorithms. This is especially relevant in computational tasks and software development. Techniques such as dynamic programming, memoization, and parallel computing can be applied to optimize algorithms.3.3 Search Engine Optimization (SEO)SEO is the practice of optimizing web content to improve its visibility and ranking in search engine results pages. Techniques such as keyword research, quality content creation, website optimization, and backlink building can be used to enhance a website’s SEO.3.4 Personal Productivity OptimizationPersonal productivity optimization involves implementing strategies and techniques to enhance individual efficiency and performance. This can include time management, goal setting, prioritization, and adopting productivity tools or methodologies like the Pomodoro Technique or Getting Things Done (GTD).4. Optimization PlanImplementing an optimization plan requires a systematic approach. Here is a general framework to consider:4.1 Define Goals and ObjectivesClearly define what needs to be optimized and set specific goals and objectives. This will provide a clear direction and guide the optimization process.4.2 Analyze Current StateIdentify the current state of the process, system, or application. This can involve gathering data, conducting surveys, or performing a thorough analysis to understand the existing strengths, weaknesses, and areas for improvement.4.3 Identify Optimization OpportunitiesBased on the analysis, identify potential optimization opportunities. These can be areas where efficiency can be improved, processes can be streamlined, or bottlenecks can be removed. Prioritize these opportunities based on their potential impact.4.4 Develop Optimization StrategiesFor each identified opportunity, develop optimization strategies. Research and evaluate different techniques that can be applied and choose the most suitable ones. Consider the resources, time, and constraints for implementing these strategies.4.5 Implement and MonitorImplement the chosen optimization strategies and monitor their effectiveness. Track key performance indicators (KPIs) to measure the impact of the optimizations and make necessary adjustments if needed.4.6 Continuous ImprovementOptimization is an ongoing process. Encourage a culture of continuous improvement by regularly reassessing and refining the optimization plan. Collect feedback from stakeholders and incorporate their suggestions for further enhancements.ConclusionOptimization is a powerful approach to improve efficiency, productivity, and overall performance in various domains. By understanding different optimization techniques and following a systematic approach, organizations or individuals can unlock their full potential. Remember, optimizing is an ongoing process, and continuous improvement is key to staying ahead in today’s competitive world.。

特征计算英语

特征计算英语Feature ComputationThe concept of feature computation has become increasingly important in the field of computer science and data analysis. Features, which are the measurable properties or characteristics of an object or a phenomenon, play a crucial role in various applications, such as pattern recognition, image processing, and machine learning. The process of extracting and quantifying these features is known as feature computation, and it is a fundamental step in many data-driven decision-making processes.One of the primary objectives of feature computation is to transform raw data into a more meaningful and informative representation that can be used for further analysis or decision-making. This transformation involves identifying and extracting the most relevant and discriminative features from the data, which can then be used as input to various algorithms or models. The choice of features and the way they are computed can have a significant impact on the performance and accuracy of these algorithms or models.In the context of image processing, for example, features can beused to describe the shape, texture, color, or other visual characteristics of an object or scene. These features can then be used for tasks such as object detection, image classification, or image retrieval. Similarly, in natural language processing, features can be extracted from text data to represent the semantic or syntactic properties of the text, which can be used for tasks such as sentiment analysis, text categorization, or language modeling.One of the key challenges in feature computation is the selection of the most relevant and informative features. This process, known as feature selection, can be a complex and iterative task, as the optimal set of features may depend on the specific problem or application at hand. Various techniques, such as correlation analysis, principal component analysis, or mutual information-based methods, can be used to identify and select the most relevant features from a larger set of candidate features.Another important aspect of feature computation is the way in which the features are represented and quantified. Different types of features may require different computational approaches, and the choice of the right representation can have a significant impact on the performance of the algorithms or models that use these features. For example, in image processing, features can be represented as numerical vectors, histograms, or more complex data structures, depending on the specific requirements of the application.In addition to the selection and representation of features, the computation of these features can also be a computationally intensive task, especially when dealing with large or high-dimensional datasets. Various optimization techniques, such as parallel processing, distributed computing, or hardware acceleration, can be used to improve the efficiency and scalability of feature computation algorithms.Overall, feature computation is a crucial aspect of many data-driven applications, and it requires a deep understanding of the underlying data, the specific problem at hand, and the available computational resources. By carefully selecting and computing the most relevant features, researchers and practitioners can develop more accurate and robust algorithms and models, which can lead to better decision-making and problem-solving in a wide range of domains.。

Machine Learning Optimization Techniques

Machine Learning OptimizationTechniquesare essential for improving the performance and efficiency of machine learning models. In this article, we will explore some of the most commonly used optimization techniques in machine learning.One of the most popular optimization techniques used in machine learning is gradient descent. Gradient descent is an iterative optimization algorithm that aims to find the minimum of a function by updating the parameters in the direction of the negative gradient of the function. This allows the model to converge to the optimal solution over multiple iterations.Another widely used optimization technique is stochastic gradient descent (SGD). In SGD, instead of calculating the gradient of the entire dataset, a random subset of the data is used to update the parameters of the model. This can speed up the optimization process, especially for large datasets, as it reduces the computational complexity of computing the gradients.Adam optimization is another popular optimization technique that combines the advantages of both gradient descent and stochastic gradient descent. Adam uses adaptive learning rates for each parameter, which allows the model to converge faster and more efficiently compared to traditional optimization algorithms.In addition to these optimization techniques, regularization methods such as L1 and L2 regularization are commonly used to prevent overfitting in machine learning models. Regularization adds a penalty term to the loss function, which helps in controlling the complexity of the model and reduces the likelihood of overfitting.Furthermore, batch normalization is another optimization technique that is used to improve the training of deep neural networks. Batch normalization normalizes the inputsof each layer in the neural network, which helps in stabilizing and accelerating the training process.Another important optimization technique is dropout, which is used to prevent overfitting in neural networks. Dropout randomly selects a subset of neurons to be ignored during training, which helps in improving the generalization ability of the model.In conclusion, machine learning optimization techniques play a crucial role in improving the performance and efficiency of machine learning models. By incorporating techniques such as gradient descent, stochastic gradient descent, Adam optimization, regularization, batch normalization, and dropout, machine learning models can achieve better accuracy and generalization on various tasks. It is essential for machine learning practitioners to understand and implement these optimization techniques effectively to build robust and accurate machine learning models.。

模型设计理念介绍英文

模型设计理念介绍英文Model Design Philosophy IntroductionThe design philosophy of a model refers to the underlying principles and concepts that guide the creation and development of the model. It establishes the framework within which the model operates and determines its overall structure and functionality. In this article, we will introduce some key concepts and principles commonly used in model design.1. Simplicity: One of the most important principles in model design is simplicity. A model should be simple and easy to understand, reducing complexity and unnecessary elements. By keeping things simple, the model becomes more accessible to users and allows for easier interpretation and analysis.2. Modularity: Modularity is another key concept in model design. It involves breaking down the model into smaller, self-contained modules that can be easily understood and modified. Each module focuses on a specific aspect of the model, making it easier to manage and maintain. Modularity also allows for reusability, as modules can be reused in different contexts or combined to create new models.3. Flexibility: A model should be flexible enough to adapt to changing requirements and new data inputs. Flexibility allows for easy modification and updating of the model without disrupting its overall structure. This can be achieved through the use of parameterization, where key variables and parameters are separated from the model logic, enabling quick adjustments andfine-tuning.4. Robustness: Robustness refers to the ability of a model to perform consistently and accurately in different scenarios and conditions. A robust model can handle uncertainties and variations in data inputs without compromising its outputs. Robustness can be achieved through rigorous testing and validation, as well as the incorporation of error handling mechanisms.5. Transparency: Transparency is an important aspect of model design, especially in fields where decisions based on models have significant repercussions. A transparent model is one where the logic and assumptions behind its calculations are clearly documented and understandable. Transparency helps build trust and credibility in the model's outputs and allows for better collaboration and accountability.6. Scalability: Scalability is the ability of a model to handle larger and more complex data sets without sacrificing performance or accuracy. A scalable model can process increasing amounts of data and computational load efficiently. This can be achieved through proper data organization, optimization techniques, and parallel computing.7. User-friendliness: A model should be designed with the end-users in mind, making it easy to use and interact with. The user interface should be intuitive and provide clear instructions and feedback. User-friendliness also includes providing appropriate documentation and support to help users understand and utilize the model effectively.In conclusion, the design philosophy of a model encompasses key principles such as simplicity, modularity, flexibility, robustness, transparency, scalability, and user-friendliness. By adhering to these principles, model designers can create effective and efficient models that meet the needs of their users and provide accurate and insightful outputs.。

opencl编程指南英文版pdf

opencl编程指南英文版pdf**Introduction**In the age of ever-increasing computational demands, the need for efficient and scalable parallel computing solutions has become paramount. OpenCL, or Open Computing Language, is an open standard for parallel programming of heterogeneous systems, enabling the efficient use of various processing units, such as CPUs, GPUs, DSPs, and FPGAs. This guide aims to provide a comprehensive understanding of OpenCL programming, covering its fundamentals, applications, and best practices.**Chapter 1: Introduction to OpenCL**OpenCL is a framework that allows software developers to write programs that can run across multiple platforms and devices. It enables the utilization of the full potential of modern hardware, especially GPUs, for general-purpose computing. OpenCL abstracts the underlying hardware details, providing a uniform programming interface for developers.**Chapter 2: OpenCL Architecture and Components**The OpenCL architecture consists of two main components: the host and the device. The host is the central processing unit (CPU) that manages the execution of the program andthe device is the hardware accelerator, such as a GPU, that performs the parallel computations. OpenCL also provides a runtime library and a set of APIs for developers to program and control the devices.**Chapter 3: OpenCL Programming Basics**OpenCL programming involves writing kernels, which are small functions that are executed in parallel on the device. Kernels are written in a subset of the C programming language and are compiled into executable code for thetarget device. This chapter covers the syntax and semantics of OpenCL kernels, including memory management, data parallelism, and synchronization.**Chapter 4: OpenCL Memory Management**Memory management in OpenCL is crucial for achieving optimal performance. This chapter discusses the different memory objects in OpenCL, such as buffers, images, and sub-buffers, and their usage. It also covers memory allocation,data transfer between the host and device, and memory access patterns for efficient data locality.**Chapter 5: OpenCL Applications and Use Cases**OpenCL finds applications in various domains, including graphics, physics simulations, machine learning, and bioinformatics. This chapter explores some real-world use cases of OpenCL, demonstrating its power and flexibility in parallel computing.**Chapter 6: OpenCL Performance Optimization**Achieving optimal performance in OpenCL programming requires careful consideration of various factors, such as workload distribution, memory access patterns, and kernel design. This chapter provides guidelines and best practices for optimizing OpenCL programs, including profiling tools and techniques for identifying and addressing performance bottlenecks.**Conclusion**OpenCL is a powerful framework for parallel computing, enabling the efficient utilization of modern hardware accelerators. This guide has provided a comprehensiveoverview of OpenCL programming, covering its architecture, programming basics, memory management, applications, and performance optimization. With the knowledge gained from this guide, developers can harness the full potential of OpenCL to create efficient and scalable parallel computing solutions.**OpenCL编程指南：深入探索并行计算的旅程****引言**随着计算需求的不断增长，对高效且可扩展的并行计算解决方案的需求变得至关重要。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Keywords Parallelizing compilers, Irregular scientiﬁc application, Communication optimization, Loop transformation, Loop partitioning, Interprocedural optimization.
Optimization Techniques for Parallel Codes of Irregular Scientiﬁc Computations ∗
Minyi Guo Dept. of Computer Software
The University of Aizu Aizu-Wakamatsu City Fukushima, 965-8580, Japan
∗This research was supported in part by the Grant-inAid for Scientiﬁc Research (C)(2) 14580386 and The Japanese Okawa Foundation for Information and Telecommunications under Grant Program 01-12.
Hwang et al. [10] presented a library called CHAOS, which helps user implement irregular programs on distributed memory machines. The CHAOS library provides eﬃcient runtime primitives for distributing data and computation over processors; it supports index translation mechanisms and provides users high-level mechanisms for optimizing communication. In particular, it provides support for parallelization of adaptive irregular programs where indirection arrays are modiﬁed during the course of computation. The CHAOS library is divided into six phases. They are Data Partitioning, Data Remapping, Iteration Partitioning, Iteration Remapping, Inspector, and Executor phase. The ﬁrst four phases concern mapping data and computations onto processors. The next two steps concern analyzing data access patterns in loops and generating optimized communication calls. The same working group as the above, Ponnusamy et al. extended the CHAOS runtime procedures which are used by a prototype Fortran 90D compiler to make it possible to emulate irregular distribution in HPF by reordering elements of data arrays and renumbering indirection arrays [13]. Also, in their paper [3], Das. et al. discussed some primitives to support communication optimization of irregular computations on distributed memory architectures. These primitives coordinate inter-processor data movement, manage the storage of, and access to, copies of oﬀ-processor data, minimize inter-processor communication requirements and support a shared name space.
Researchers have demonstrated that the performance of irregular parallel code can be improved by applying a combination of computation and data layout transformations. Some researches focus on providing primitives and libraries for runtime support [2, 10, 3], some provide language support such as add irregular facilities to HPF or Fortran 90 [13, 15], and some works attempt to utilize caches and locality eﬃciently [4].
minyi@u-aizu.ac.jp
Weng-Long Chang Dept. of Info. Management Southern Taiwan University
of Technology Tainan, Taiwan, 710, R.O.C.
changwl@.tw
Yi Pan Dept. of Computer Science
Georgia State University University Plaza, Atlanta
GA 30303, USA pan@
Abstract
In this paper, we propose a communication cost reduction computes rule for irregular loop partitioning, called least communication computes rule. For an irregular loop with nonlinear array subscripts, the loop is transformed to a normalized single loop, then we partition the loop iterations to processors on which the minimal communication cost is ensured when executing those iterations. We also give some interprocedural optimization techniques for communication preprocessing when the irregular code has the procedure call. The experimental results show that, in most cases, our approaches achieved better performance than other loop partitioning rules.
Communication overhead inﬂuences the performance of parallel programs signiﬁcantly. According to Hockney’s representation, communication overhead can be measured by a linear function of the message length m — Tcomm = Ts + mTd — where Ts is the start-up time and Td is the per-byte messaging time. To achieve good performance, we must optimize communication in following three aspects:
• to oit local computation as much as possible;
• to vectorize and aggregate communication in order to reduce the number of communications; and
• to reduce the message length in a communication step.
1
Irregular applications are found in unstructured computational ﬂuid dynamic (CFD) solvers, molecular dynamics codes, diagonal or polynomial preconditioned iterative linear solvers, and n-body solvers.
As the scientists attempt to model and compute more complicated problems, they have to envisage to develop eﬃcient parallel code for sparse and unstructured problems in which array accesses are made through a level of indirection or nonlinear array subscript expressions. This means that the data arrays are indexed either through the values in other arrays, which are called indirection arrays/index arrays, or through non-aﬃne subscripts. The use of indirect/nonlinear indexing causes the data access patterns, i.e. the indices of the data arrays being accessed, to be highly irregular. Such a problem is called irregular problem, in which the dependency structure is determined by variable causes known only at runtime.