Performance evaluation of a multistring photovoltaic module with distributed DC-DC converters

合集下载

《公共建筑节能(绿色建筑)工程施工质量验收规范》DBJ50-234-2016

《公共建筑节能(绿色建筑)工程施工质量验收规范》DBJ50-234-2016
本规范共分 22 章和 13 个附录,主要内容包括:总则,术语,基本规定,墙体节能 工程,幕墙节能工程,门窗节能工程,屋面节能工程,地面节能工程,供暖节能工程, 通风与空调设备节能工程,空调与供暖系统冷热源节能工程,空调与供暖系统管网节能 工程,配电节能工程,照明节能工程,地源热泵系统节能工程,太阳能光热系统节能工 程,太阳能光伏节能工程,监测与控制节能工程,建筑环境工程,资源综合利用工程, 建筑节能(绿色建筑)工程现场实体检验,建筑节能(绿色建筑)工程质量验收。
( 7 ) 本 规 范 第 16.2.10 条 依 据 国 家 标 准 《 太 阳 能 供 热 采 暖 工 程 技 术 规 范 》 GB50495-2009 第 5.3.5 条的规定。
(8)本规范第 3.4.4 条为绿色建筑工程涉及的建筑环境与资源综合利用子分部工程 验收方式的规定。
本规范由重庆市城乡建设委员会负责管理,由重庆市建设技术发展中心(重庆市建 筑节能中心)、重庆市绿色建筑技术促进中心负责具体技术内容解释。在本规范的实施 过程中,希望各单位注意收集资料,总结经验,并将需要修改、补充的意见和有关资料 交重庆市建设技术发展中心(重庆市渝中区牛角沱上清寺路 69 号 7 楼,邮编:400015, 电话:023-63601374,传真:023-63861277),以便今后修订时参考。
建设部备案号: J13144-2015
DB
重庆市工程建设标准 DBJ50-234-2016Leabharlann 公共建筑节能(绿色建筑)工程
施工质量验收规范
Code for acceptance of energy efficient public building(green building) construction
(3)本规范第 1.0.4、3.1.2、11.2.4、22.0.6、22.0.7 条内容分别依据国家标准《建 筑节能工程施工质量验收规范》GB50411-2007 第 1.0.5、3.1.2 条、11.2.3、15.0.5、15.0.5 条等强制性条文要求。

HPC Benchmarking and Performance Evaluation With Realistic Applications 1 INTRO GOALS OF HP

HPC Benchmarking and Performance Evaluation With Realistic Applications 1 INTRO GOALS OF HP

1 I NTRO : G OALS OF HPC B ENCHMARKING P ERFORMANCE E VAhmarking and performance evaluation, as viewed in this paper, is to assess the performance and understand characteristics of HPC platforms and their important applications. An obvious use of the gained results is the search for machines that are best for a given purpose. Equally important uses are the creation of yardsticks for research and the anticipation of needs of future HPC technology. The main thesis of this paper is that there is a dire need for basing such assessments on realistic computer applications. While the use of metrics that rely on measuring the behavior of simple program kernels has served us for many purposes in the past, it fails to answer questions that are of decisive importance today and in the future. These questions deal with the directions future HPC development should take, which have a direct impact on the competitiveness of our industry and nation. Benchmarking and Performance Evaluation allow us to answer two important areas of questions, to which we will refer as the Relevant Questions: 1) How do HPC platforms perform in solving todays important problems? We may learn that a biology application may take a full year to fold a certain protein on a present-day HPC platform. Beyond such absolute time metrics, we may ask how computer platforms perform relative to others for a certain application or application area. Furthermore, we may be interested in finding out how certain system components perform for these applications. For example, we may ask for absolute communication times, the fraction of IØtaken in the overall execution, or the percentage of peak cpu power exploited. 2) What are the characteristics of today’s computational applications and how do these characteristics change in future problems? We may want to know the memory required by a realistic weather forecast problem or how much communication will happen per quantity of computation. We may also ask how these properties change as a chemist increases the number of particles in a molecular dynamics simulation 100-fold. In this paper we focus on the first area of questions. Key to both tasks is the use of today’s realistic applications and real-

Rewrad and Performance session two

Rewrad and Performance session two

Break down the brief
A description and evaluation of at least three components of a total reward system in relation to performance management, 1 of which should be non financial What are the key words?
Skills needed to pass
Research skills • Go to the library • Look at relevant chapters of HR text books on the reading list. • Find Journal articles • Set your own parameters
Skills needed to pass
Time management and planning • Read gradually, don't do it all at the last minute • Do a little every week, build up your knowledge • Hand it in on time
Reward and Performance
Skill Development
The assessment
Imagine that you are working in an SME and have been asked by your manager to explain the fundamentals of a performance management system linked to reward. Write a report of no more than 2500 words (+10%), which summarises what you have found out. Your report could include the following • The purpose of performance management systems and the relationship performance management system must have to business objectives. • An evaluation of at least three components of performance management systems. • An evaluation of the relationship between motivation and performance management, referring to at least two motivational theories. • A description and evaluation of at least three components of a total reward system in relation to performance management, 1 of which should be non financial

performancetest工具结果解析 -回复

performancetest工具结果解析 -回复

performancetest工具结果解析-回复Performancetest工具结果解析Performance testing is a crucial aspect of software development as it provides insights into the performance and scalability of an application. Performance testing involves using various tools and techniques to evaluate system response time, throughput, and stability under different workloads. One such tool widely used for performance testing is Performancetest. In this article, we will delve into the interpretation and analysis of Performancetest tool results.1. Introduction to Performancetest:Performancetest is a comprehensive performance testing tool developed by PassMark Software. It allows testers to assess the performance of their applications by simulating real-world scenarios and generating detailed performance reports. This tool supports a wide array of performance tests, including CPU, disk, memory, 2D and 3D graphics, networking, and more.2. Understanding Performancetest metrics:Performancetest provides a range of metrics that help gauge the performance of an application. Some of the important metricsinclude:2.1 CPU performance:This metric measures the performance of the CPU by performing complex calculations and generating a score. A higher score indicates better CPU performance.2.2 Disk performance:Disk performance evaluates the read and write speeds of a storage device. It determines how quickly data can be accessed from or written to the disk. The metric reports the transfer rate in MB/s, with higher values indicating faster disk performance.2.3 Memory performance:Memory performance tests the speed at which the system can read from and write to the RAM. It measures the memory's latency and throughput. A higher score indicates better memory performance.2.4 2D and 3D graphics performance:This metric assesses the graphical rendering capability of the system. It performs rendering operations and provides a score. A higher score suggests better graphics performance.2.5 Networking performance:Networking performance tests the speed at which data can be transferred over a network connection. It measures the throughput and latency of the network. Higher values indicate better networking performance.3. Interpreting Performancetest results:Performancetest generates detailed reports in tabular and graphical formats, making it easier to interpret and analyze the results. Here's a step-by-step guide to interpreting the results:3.1 Identify the performance metrics being measured:The first step is to identify the metrics being measured in the performance test. Look for the specific metrics mentioned in the results report, such as CPU performance, disk performance, memory performance, and so on.3.2 Analyze the scores or values:Next, analyze the scores or values associated with each metric. Compare the values with industry benchmarks or previous test results to gauge the performance of the application. Higher scoresor values generally indicate better performance.3.3 Look for any anomalies:Check for any significant deviations or anomalies in the results. Anomalies may indicate performance bottlenecks or issues that need further investigation. For example, a sudden drop in network performance could suggest a network configuration problem.3.4 Consider the workload:Consider the workload used during the test and how it relates to the application's real-world usage. If the workload simulated in the test does not align with the expected usage pattern, the results may not accurately reflect the application's performance.3.5 Identify performance limitations:Identify any performance limitations based on the results. This could include CPU bottlenecks, slow disk read/write speeds, memory constraints, graphics rendering issues, or network latency.4. Understanding the implications of results:Interpreting Performancetest results also involves understanding the implications of the findings. Here are a few key considerations:4.1 Scalability:Evaluate how the application performs under different workloads. Determine if the performance remains consistent or degrades as the workload increases. Scalability issues could indicate the need for optimization or infrastructure upgrades.4.2 Performance bottlenecks:Identify the factors causing performance bottlenecks and prioritize their resolution based on impact. This could involve optimizing code, improving database queries, or upgrading hardware resources.4.3 Dependency analysis:Analyze dependencies between different components of the system and identify any bottlenecks or performance issues. For example, slow disk performance may impact overall application performance.4.4 Root cause analysis:If performance issues are identified, conduct a root cause analysis to determine the underlying reasons. This could involve analyzinglogs, profiling the application, or using additional diagnostic tools.In conclusion, Performancetest is a powerful tool for evaluating the performance of software applications. Understanding and interpreting the results generated by this tool is crucial for identifying performance bottlenecks and optimizing the application. By following a systematic approach and considering various factors, testers can gain valuable insights from Performancetest results and enhance the overall performance of their applications.。

2024年度初中英语中考阅读理解课件

2024年度初中英语中考阅读理解课件
Evaluate the evidence
Weigh the evidence presented in the text to determine the truth or falsehood of a statement
10
Main ideas
2024/2/2
Identify the topic
03
If there are multiple choices, compare and contrast them
with the information in the text to find the best match
9
Inference and True/False Questions
2024/2/2
7
02
Common question types and problem solving techniques
Chapter
2024/2/2
8
Fact detail questions
2024/2/2
Identify key information
01
Look for specific details in the text that directly answers
Reading comprehension passages typically cover a range of topics, including but not limited to school life, daily activities, social issues, and science and technology
reasonable guess based on the information you have and move

A preliminary evaluation of the performance of multiple

A preliminary evaluation of the performance of multiple

ORIGINAL ARTICLEA preliminary evaluation of the performance of multipleionospheric models in low-and mid-latitude regions of China in 2010–2011Weihua Luo •Zhizhao Liu •Min LiReceived:23July 2012/Accepted:3June 2013ÓSpringer-Verlag Berlin Heidelberg 2013Abstract Ionospheric delay is a dominant error source in Global Navigation Satellite System (GNSS).Single-fre-quency GNSS applications require ionospheric correction of signal delay caused by the charged particles in the earth’s ionosphere.The Chinese Beidou system is devel-oping its own ionospheric model for single-frequency users.The number of single-frequency GNSS users and applications is expected to grow fast in the next years in China.Thus,developing an appropriate ionospheric model is crucially important for the Chinese Beidou system and worldwide single-frequency Beidou users.We study the performance of five globally accessible ionospheric models Global Ionospheric Map (GIM),International Reference Ionosphere (IRI),Parameterized Ionospheric Model (PIM),Klobuchar and NeQuick in low-and mid-latitude regions of China under mid-solar activity condition.Generally,all ionospheric models can reproduce the trend of diurnal ionosphere variations.It is found that all the models have better performances in mid-latitude than in low-latitude regions.When all the models are compared to the observed total electron content (TEC)data derived from GIM model,the IRI model (2012version)has the best agreement with GIM model and the NeQuick has the poorest agreement.The RMS errors of the IRI model using the GIM TEC as reference truth are about 3.0–10.0TECU in low-latitude regions and 3.0–8.0TECU in mid-latitude regions,as observed during a period of 1year with medium level of solar activity.When all the ionospheric models are inges-ted into single-frequency precise point positioning (PPP)to correct the ionospheric delays in GPS observations,the PIM model performs the best in both low and mid-latitudes in China.In mid-latitude,the daily single-frequency PPP accuracy using PIM model is *10cm in horizontal and *20cm in up direction.At low-latitude regions,the PPP error using PIM model is 10–20cm in north,30–40cm in east and *60cm in up component.The single-frequency PPP solutions indicate that NeQuick model has the lowest accuracy among all the models in both low-and mid-lati-tude regions of China.This study suggests that the PIM model may be considered for single-frequency GNSS users in China to achieve a good positioning accuracy in both low-and mid-latitude regions.Keywords Ionospheric model ÁPrecise point positioning (PPP)ÁIonospheric model evaluation ÁLow-and mid-latitude regions of ChinaIntroductionThe range error resulting from the ionospheric delay can be up to 100m in magnitude and can significantly affect the accuracy of the positioning,navigation,velocity and tim-ing.The ionospheric delay of dual-frequency Global Navigation Satellite System (GNSS)users can be precisely corrected by measurements from dual-frequency signals.W.Luo ÁZ.Liu (&)ÁM.Li (&)Department of Land Surveying and Geo-Informatics (LSGI),The Hong Kong Polytechnic University,Hung Hom,Kowloon,Hong Kong,China e-mail:lszzliu@.hk M.Lie-mail:lim@W.LuoCollege of Electronics and Information Engineering,South-Central University for Nationalities,Wuhan,China M.LiGNSS Research Center,Wuhan University,Wuhan,ChinaGPS SolutDOI 10.1007/s10291-013-0330-zTo thefirst-order approximation,the ionospheric range error can be estimated as d=40.3TEC/f2,where f is the signal frequency and TEC represents total electron con-tents.TEC is the integral of the electron density along a ray path from GNSS receiver to satellite.There are still numerous single-frequency GNSS users who need correc-tion information to mitigate their ionospheric range errors.To correct ionospheric delay for Global Positioning System(GPS)users,ideally a simple and accurate iono-spheric model can be used.The simplest one to use might be the well-known broadcast model or Klobuchar model (Klobuchar1987).With eight broadcast parameters,the ionospheric correction can be calculated in real time but the accuracy is limited to about50–60%(Bidaine and Warnant2011).China is progressively developing its own Beidou system,and similarly,one such ionospheric model is needed(Shi et al.2012).Similar to the GPS,the Beidou Ionospheric Model(BIM)uses eight parameters to broad-cast ionospheric correction information(Wu et al.2012). Basically,two approaches are available to correct the ionospheric delay errors for single-frequency GNSS users. One approach is to use the existing ionospheric models to compute ionospheric correction.The ionospheric models are usually established on the basis of some physical principles as well as some historical ionospheric data.The other approach is to use real-time ionospheric TEC observation data derived from GNSS networks to construct a numerical model and then compute ionospheric correc-tion for single-frequency users.In thefirst approach for ionosphere correction,several ionospheric models exist and can be used for GNSS ion-ospheric correction.Generally,these models can be clas-sified into three main categories(Feltens et al.2011):(1) physical models based on ionospheric physics and chem-istry,such as SAMI2(Huba et al.2000);(2)semi-physical models that simplify the physical models by reducing the input parameters,such as Parameterized Ionospheric Model(PIM)(Daniell et al.1995);(3)empirical or semi-empirical models based on observations,such as Interna-tional Reference Ionosphere(IRI)model(Bilitza2001; Bilitza and Reinisch2008)and the NeQuick model(Rad-icella and Leitinger2001).For the purpose of correcting ionospheric range delay for satellite navigation systems,a lot of research has evaluated different ionospheric models.For example,Fa-rah(2008)compared the TEC derived from Klobuchar model and NeQuick model with TEC from the Interna-tional GNSS Service(IGS)Global Ionospheric Mapping (GIM)products in different regions under different solar activities.Zhang et al.(2010)compared the latitudinal distribution of TEC derived from the DORIS with the NeQuick model.Klobuchar model and NeQuick model were also compared by Radicella et al.(2008).Nafisi and Beranvand(2005)compared the Klobuchar TEC with the IGS TEC for a special region.Differences exist between the TEC calculated from ionospheric models and the observed ones.The performances of different ionospheric models vary with geographic regions.For example,the NeQuick model which is used for the Galileo satellite navigation system is shown to be more accurate in Wuhan than in Beijing (Wang et al.2007).In North America and Europe,the NeQuick model has shown a remarkably better perfor-mance than the IRI and Klobuchar models to reproduce the characteristics of slant TEC(Coisson et al.2004).Goodwin and Breed(2001)compared the GPS-derived TEC obser-vations in Australia with TEC derived from IRI and PIM models and found that the discrepancy between the observations and the models was about3–7TECU.Zhang et al.(2006a)compared the results using the IRI90and Klobuchar models in China and Europe and indicated that IRI90may be more accurate than Klobuchar.Zhang et al. (2006b)compared ionosonde-derived TEC with the IGS-derived satellite-based TEC as well as IRI2001-modeled TEC at a low-latitude station in Hainan Province,China. They indicated that IRI data can reproduce the semi-annual variation of TEC.The differences between IRI TEC and IGS TEC can be as large as20TECU.Bi et al.(2012) evaluated the accuracy of the Klobuchar model,the Ne-Quick model and GIM from Center for Orbit Determina-tion in Europe for mid-latitude regions and indicated that GIM may be used as a reference.Wu et al.(2012)com-pared the Beidou Ionospheric Model with the GPS Klob-uchar model and found that the Beidou Ionospheric Model outperforms the Klobuchar model by7.8–35.3%in the single-frequency single-point positioning in the Northern Hemisphere including Asia,Europe and North America.However,in the past years,only a few ionospheric models have been evaluated simultaneously(Orus et al. 2002;Coisson et al.2004).A careful study of the perfor-mances of different ionospheric models in China is par-ticularly necessary because of the expected large number of single-frequency Beidou/GNSS users.For the Chinese Beidou satellite navigation system,the Beidou Ionospheric Model(BIM)that is similar to the GPS broadcast model has been specifically designed.The performance of the BIM in China region was evaluated in Wu et al.(2012). But more work needs to be done;for instance,only1day of data from two stations in China was analyzed to evaluate the BIM.It is inadequate to give a statistically meaningful conclusion without data analysis using more stations and longer period of observations.For the China region,some regional nowcasting and forecasting ionospheric models have been developed for the purpose of radiowave and space physics research in China.A Chinese Reference Ionosphere(CRI)wasGPS Solutdeveloped based on the International Reference Ionosphere by incorporating the ionospheric data collected in China region(Liu et al.1994).Liu et al.(2005)introduced an iteration method to forecast region f0F2based on the long-term observations.In order to forecast f0F2,more than10h of data ahead of the prediction epoch should be acquired. Furthermore,the forecasting f0F2can be used for TEC forecasting(Liu et al.2008).Wan et al.(2007)developed an ionospheric TEC nowcasting system over the China region based on GPS data from four GPS stations in China. The real-time TEC map is published on the Internet (/TEC.asp).Some of the above model studies such as the one by Liu et al.(1994)were not for TEC calculation.The other work such as Wan et al. (2007)focused on TEC calculation.However,no numeri-cal TEC data or TEC model was available for public access.Thus,we did not include these Chinese ionospheric models in the study.Unfortunately,the TEC data from another model for the China region,the Beidou Ionospheric Model,were not available at present for this study.Thus,it is not included as well.We present a comparative evaluation of TEC quality from Klobuchar,IRI,PIM and NeQuick models with respect to the TEC derived from GIM data produced by Jet Propulsion Laboratory(JPL).These models are chosen for comparison with the JPL GIM model because(1)these models are openly accessible by single-frequency GNSS users,and they basically belong to thefirst approach of ionosphere modeling as discussed above and(2)JPL uses real-time GPS TEC data to update the GIM model.The GIM model largely represents the second approach of ion-ospheric modeling as discussed above,though it also uses Bent and IRI model as background model when GPS TEC data are not available(Mannucci et al.1998).In order to evaluate the performance of all these models in the China region for single-frequency users,the absolute accuracies of these models are further evaluated in a single-frequency GPS positioning program.This is achieved using the model-derived TEC data to correct the ionospheric range delay in single-frequency GPS data.We study the TECs at different geographic regions over China for a1-year period from October2010to September2011.Through comprehensive evaluation of multiple ionospheric models,this study will help identify a suitable ionospheric model to correct the ionospheric error for single-frequency GNSS users in the China region under mid-solar activity condition. Modeling overviewIn this section,thefive ionospheric models are introduced and briefly described,including their input and output parameters and their applications.Klobuchar modelKlobuchar model(Klobuchar1987)is a relatively simple ionospheric model built on a simple cosine function,with a maximum delay at14:00local time and a constant offset of 5ns during nighttime.It is the model used by the GPS single-frequency users.The model period and amplitude are described by a third-degree polynomial in local time and geomagnetic latitude.The required input parameters include location of the station,elevation and azimuth of GPS satellite and eight Klobuchar coefficients.The eight coefficients are time-varying and broadcast in the GPS navigation message.The Klobuchar model assumes an ideal smooth behavior of the ionosphere;thus,important dailyfluctuations cannot be accounted for.The accuracy of the model is limited to 50–60%of the total effect during quiet space weather conditions(Bidaine and Warnant2011;Filjar et al.2009). Under some special circumstances,such as severe iono-spheric activity at low elevations,the model performs poorly.However,this model can be implemented easily and the computation is fast due to its simplicity.IRI modelThe International Reference Ionosphere(IRI)is a widely used empirical model in the ionosphere community(Bilitza 2001).It is an internationally recommended standard for the specification of plasma parameters in earth’s iono-sphere(Bilitza and Reinisch2008).It is developed based on many available and reliable data sources for the iono-spheric plasma,such as a worldwide ionosonde network, radar data and satellite and rocket measurements.Over the years,this model has been steadily improved and several versions have been released(Bilitza and Reinisch2008). The most recent version of IRI model is IRI-2012,and it is used in this study(Bilitza et al.2011).For a given location(latitude,longitude),local time(or universal time)and date,the IRI provides the electron den-sity,electron temperature,ion temperature and ion compo-sition in the height interval from50to2,000km and the TEC. The IRI model can reproduce TEC variations reasonably well with respect to observational TEC(Wilkinson et al.2001). The IRI provides several options to calculate the electron density and TEC,such as the f0F2/hmF2model and the topside ionosphere.For this research,the f0F2/hmF2model is from International Union of Radio Science(URSI)model,the D-region is described by IRI-95and the topside ionosphere is represented by NeQuick model(refer to the next section).NeQuick modelNeQuick is a monthly median3-D empirical ionospheric model that uses a combination of multiple Epstein layers toGPS Solutproduce an analytic function for the ionospheric electron density profile and TEC up to a height of20,000km (Hochegger et al.2000).The NeQuick model is divided into two parts:(1)the bottom side,up to the F2-layer peak consisting of a sum offive semi-Epstein layers with modeled thickness parameters and(2)the topside described by means of the sixth semi-Epstein layer with a height-dependent thickness parameter that is empirically deter-mined.The input parameters include location(latitude, longitude),time(year,month,day,UT),and monthly smoothed sunspot number(R12)or daily solarflux.The NeQuick model can depict day-to-day variations in the range delay corrections when it takes the daily average sunspot number into account.In European regions,Ne-Quick can provide better ionospheric delay corrections than the Klobuchar and IRI models(Farah2008).In the Galileo system,the ionosphere correction accuracy can reach about70%using the NeQuick model(Bidaine and Warnant2011).NeQuick is used as the official ionosphere correction model incorporated in Galileo receivers(Bida-ine and Warnant2011),and the model coefficients are broadcast as part of the Galileo navigation message.The model has also been adopted as an ITU standard(ITU-R 2007).PIM modelThe Parameterized Ionospheric Model(PIM)(Daniell et al. 1995)is a semi-analytic global ionospheric and plasma-spheric model based on the parameterized output of an empirical plasmaspheric model and several regional theo-retical ionosphere models.The ionospheric part combines the results of several theoretical ionosphere models,cov-ering the E and F layers for all latitudes,longitudes and local times.The inputs of PIM include the solarflares and geomagnetic index in addition to the information of loca-tion and time.PIM outputs include electron density profiles in the altitudinal range from90km to25,000km,ion composition and also TEC for all levels of solar and geo-magnetic activities.The computation of PIM is fast while the physical features of the ionosphere are retained.GIM modelThe global ionosphere TEC data are routinely computed by data analysis centers at IGS,including Jet Propulsion Labo-ratory(JPL)that uses more than200worldwide ground-based GPS receivers(Mannucci et al.1998).The TEC data are published at a2-h time interval as Global Ionosphere Maps (GIMs)(ftp:///pub/gps/products/ionex). The vertical total electron content(VTEC)is calculated in a solar-geomagnetic reference frame using bi-cubic splines on a spherical grid.The GIM covers±87.5°latitude and±180°longitude ranges and has a spatial resolution of2.5°and5°in latitude and longitude,respectively.The ionospheric TEC values can be interpolated using the surrounding GIM grids.GIM TEC map can predict more than90%of iono-spheric TEC.Total accuracy of GIM data was evaluated around3.7–3.9TECU(Sekido et al.2003).Generally,GIM can provide accurate TEC values and it can be taken as a powerful tool for monitoring global ionosphere in near real time(Ho et al.1997).In this research,the GIM TEC data will be used as reference to which the TEC from other models are compared.TEC model comparison resultIn this section,TEC derived from the theoretical NeQuick, Klobuchar,PIM and IRI models are compared with the observational model GIM over a1-year period(October 2010to September2011)in the low and mid-latitude in China region.For comparison purpose,the monthly mean TEC is calculated:TEC Model;j¼1mX mi¼1TEC Model;ijwhere TEC Model;ij is the TEC at the j th epoch of i th day for a given ionospheric model(including the GIM model);m denotes the number of days in a month;and the TEC Model;j is the monthly mean TEC for the j th epoch for a given ionospheric model.Since the TECs from the models are calculated at an interval of30s,the number of epochs in 24h is2,880.The30-s interval is selected because the TEC data are used to correct30-s GPS observations in order to evaluate the accuracy of each ionospheric model for single-frequency GPS positioning.Furthermore,the root mean square(RMS)for each model using the GIM as reference is calculated as below.RMS¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1nX nj¼1TEC Model;jÀTEC GIM;jÂÃ2v uu twhere TEC GIM;j is the monthly mean TEC from the GIM model for the j th epoch.Since there are2,880epochs in 24-h period,the n has a value of2,880in the above equation.Test site selectionThe Chinese mainland spans from the equatorial region to mid-latitudes.To evaluate the performances of different ionospheric models in different regions,eight stations at various latitudes are chosen as test sites,as shown in Tables1and2.In the TEC computation using differentGPS Solutmodels,the upper integral height is set as20,000km (except IRI model as2,000km).Low-latitude geographic regionTo demonstrate the performance of each ionospheric model in low-latitude region,the monthly mean TEC calculated over1-month period(here July)using thefive ionospheric models at two low-latitude stations(HKTU and WUHN) are shown in Fig.1.It can be seen that the TEC derived from multiple ionospheric models have a shape very similar to that of GIM TEC.The maximum TEC value occurs around08 UT(LT=UT?8h),and the minimum one is around20 UT.It can be seen that the IRI model has a better agreement with the GIM data than other models.At the HKTU station,IRI overestimates the TEC during4–10 UT but underestimates TEC during12–22UT,by a few TECU with respect to the GIM model.The other models consistently underestimate the TEC by more than10 TECU with respect to the GIM model.At HKTU station, the disagreement between Klobuchar and GIM model can be as large as20TECU.The underestimation of the models with respect to GIM is not purely due to modeling error.One reason for such an underestimation might be attributed to the difference in modeling heights.The GPS-based GIM model directly measures TEC from ground up to the GPS satellite height*20,200km,while the mod-eling heights of most models are significantly lower.In Thompson et al.(2009),it is estimated that vertical TEC of the plasmasphere contribution from1,400to20,200km is3TECU.Mannucci et al.(1998)also estimated that the electron content above1,300km is1–3TECU.The good agreement between IRI and GIM models may be explained by the fact that the GIM model uses the cli-matological models such as IRI and Bent models where there are TEC data gaps like the China region(Mannucci et al.1998).However,this agreement indicates the good consistency only between the two models.The absolute accuracies of these ionospheric models need further evaluation through the PPP computation discussed in the following section.Figure2shows the RMS of the monthly mean TEC calculated from four ionospheric models from October 2010to September2011at four low-latitude stations.The figure shows that the RMS of NeQuick model is apparently much larger than those of other models.From October to April at HKTU and KUNM stations(low latitudes in low-latitude region),the RMS errors of the NeQuick model are much larger than other models and can be as large as60 TECU.At LHAZ and WUHN stations(high latitudes in low-latitude region),the NeQuick model has larger RMS errors but the error size is much smaller,approximately20 TECU.Generally,the Klobuchar and IRI model have better performance than other models.At stations HKTU and KUNM,the RMS for Klobuchar model is smaller than IRI model during November2010and January2011with the values\5.5TECU while the RMS for IRI model is smaller than Klobuchar model in most months of the rest of the year.At the other two stations LHAZ and WUHN(near the EIA crest),the RMS of Klobuchar model is smaller than IRI model from October2010to March2011,with the values\5TECU.In other months,the IRI model generally has a better performance than the Klobuchar model.Table1The location of four low-latitude stationsStation name Latitude(°)Longitude(°)Height(m)t.(°)HKTU22.3028114.179534.60510.9285 KUNM25.0295102.79721,986.26613.7175 LHAZ29.657391.10403,624.60918.8285 WUHN30.5317114.357325.86819.1589Table2The location of four mid-latitude stationsStation name Latitude(°)Longitude(°)Height(m)t.(°)XIAN34.3687109.2215463.91822.9702 BJFS39.6087115.892587.46828.3223 CHAN43.7907125.4442273.35232.7320 HRBN45.7026126.6204198.32434.6790GPS SolutMid-latitude geographic regionFigure3displays the performance of monthly mean TEC of multiple ionospheric models in July2011at two mid-latitude stations XIAN and HRBN.Generally,the modeled TEC has similar variation shapes as that of GIM TEC,but the differences between the modeled TEC and the observed TEC(from GIM)are more remarkable at mid-latitude regions than those in low-latitude regions.Consistent with low-latitude regions,the IRI model overall has the best agreement with the GIM model.It can be seen that the PIM and NeQuick have the largest disagreement with the GIM model,consistently underestimating the TEC with respect to the GIM model.Figure4shows the RMS of various models at four mid-latitude stations.The IRI and Klobuchar models clearly have smaller RMS than NeQuick and PIM models.For the Klobuchar model,the maximum RMS occurs in March. For the IRI model,the maximum RMS error occurs in May.The performance of IRI model is consistent with that in low-latitude region as shown in Fig.2.Generally,the NeQuick model has the largest RMS value among all themodels.This is also consistent with its performance in low-latitude region shown in Fig.2.Comparing Figs.4and2,it can be found that the RMS values of all the models in mid-latitude region are much smaller than those in low-latitude region.Taking the Ne-Quick model for example,the largest RMS of NeQuick in low-latitude region can be as large as70TECU(Nov2010 at HKTU station).However,the largest NeQuick RMS error in mid-latitude region is only about20TECU(Sep-tember2011at HRBN station).The other three models also have smaller RMS values in mid-latitude region than in low-latitude region.In other words,all the four models have better performances in mid-latitude region than low-latitude region.This might be explained as the ionosphere in low-latitude region is more active and disturbed than in mid-latitude region.Validation by single-frequency precise point positioning (PPP)In the analysis presented above,the observed TEC data from GIM model are used as a reference to which the other four models are compared.The GIM model is chosen as reference in this study because this model largely depends on the TEC observations from global GPS data.The GIMGPS Solutmodel has much frequent(normally2h)update than other models.However,the model assessment shown above is actually a relative one with respect to the performance of GIM model.In order to validate the performance of all the ionospheric models in an absolute sense,thefive models, including the GIM model,are further evaluated in the position domain using a single-frequency precise point positioning(PPP)approach.In this approach,the single-frequency data are processed in a PPP mode,although all of the8tested sites are Chinese Continuously Operating Reference Stations(CORS)and generate dual-frequency observations.The purpose of using single-frequency data only is to apply the ionospheric models to correct the ionospheric delays,instead of using dual-frequency observations that would eliminate the ionospheric effects. Since the ionospheric delay is the dominant error in single-frequency PPP,the accuracy of which can be used as an indicator of the absolute accuracy of ionospheric models.The daily GPS data from the8test stations are processed in static mode at an interval of30s for the period October 2010to September2011.The Positioning And Navigation Data Analyst(PANDA)software package(Liu and Ge 2003)developed at Wuhan University is used to compute the single-frequency PPP solutions.PANDA can deliver precise point positioning solutions of a few millimeters when dual-frequency static GPS data are used(Ge et al. 2008).The single-frequency PPP errors corresponding to the GIM,IRI,PIM,Klobuchar and NeQuick ionospheric models are illustrated in Fig.5.The results from two sta-tions are included:one mid-latitude station BJFS and one low-latitude station KUNM.In the comparison,the true coordinates are obtained by computing a weekly dual-fre-quency PPP solution.As said earlier,the daily PPP solu-tions from the PANDA can achieve a high accuracy of a few millimeters(Liu and Ge2003).It can be seen from Figs.5and6that the single-frequency PPP solutions using ionospheric models have positioning errors of a few deci-meters in horizontal and even meters in height.The dual-frequency PPP solutions are thus sufficiently accurate being used as a reference to evaluate the single-frequency ones.It can be clearly seen that the single-frequency PPP solutions corresponding to NeQuick model have the worst accuracy.This is consistent with the NeQuick perfor-mances shown in Figs.2and4.Particularly at the low-latitude station KUNM,the PPP solution errors using NeQuick model may reach up to8meters.In addition,the accuracy of PPP solutions using NeQuick model is not stable over time.Significant variations over seasons are exhibited,particularly at low-latitude station KUNM. Generally,the NeQuick model has a better precision during May to September,which is consistent with Figs.2and4. Compared to other models,Fig.5shows that the PIM can provide the best correction for the single-frequency GPS data in China.The RMS of the daily positioning errors between single-frequency PPP and the truth coordinates for all the8sta-tions is shown in Fig.6.The x axis shows the station names,arranged in the descending order of latitude.The RMS values in the east,north and up directions for all the ionospheric models clearly show a strong correlation with latitude.At higher latitudes,the positioning accuracies get higher.This is consistent with the results shown in Figs.2 and4.This can be explained since the ionosphere inGPS Solut。

Performance Evaluation for Decentralized Operations

Performance Evaluation for Decentralized Operations

$150
$1,555 2,100 650 $3,655 $3,655 $800 $800
This is shown on the Vice-President’s budget production report (Slide 16).
Cost Centers
Budget Performance Report Vice-President, Production For the Month Ended October 31, 2006 Over Budget Actual Budget
Responsibility Centers
Cost Centers Managers are held accountable for controlling costs.
Profit Centers Managers are held accountable for costs and making decisions that impact revenues favorably.
Responsibility Centers
Investment Centers Managers are held accountable for costs and revenues and are also held accountable for the efficient use of assets.
Objectives
1. List and explain the advantages and After of studying this disadvantages decentralized chapter, you should operations. be able to: accounting 2. Prepare a responsibility report for a cost center. 3. Prepare responsibility accounting reports for a profit center. 4. Compute and interpret the rate of return on investment, the residual income, and the balanced scorecard for an investment center.

Performance Evaluation of a Multi-Zone Application in Different OpenMP Approaches

Performance Evaluation of a Multi-Zone Application in Different OpenMP Approaches

Performance Evaluation of a Multi-Zone Application in Different OpenMP Approaches∗Haoqiang JinNAS Division,NASA Ames Research Center,Moffett Field,CA94035-1000hjin@Barbara Chapman,Lei HuangDepartment of Computer Science,University of Houston,Houston,TX77004{chapman,leihuang}@NAS Technical Report NAS-07-009,October2007AbstractWe describe a performance study of a multi-zone application benchmark implemented in several OpenMP approaches that exploit multi-level parallelism and deal with unbalancedworkload.The multi-zone application was derived from the well-known NAS Parallel Bench-marks(NPB)suite that involvesflow solvers on collections of loosely coupled discretizationmeshes.Parallel versions of this application have been developed using the Subteam conceptand Workqueuing model as extensions to the current OpenMP.We examine the performanceimpact of these extensions to OpenMP and compare with hybrid and nested OpenMP ap-proaches on a large shared memory parallel system.1IntroductionSince its introduction in1997,OpenMP has become the de facto standard for shared memory parallel programming.The notable advantages of the model are its global view of memory space that simplifies programming development and its incremental approach toward parallelization. However,it is a big challenge to scale OpenMP codes to tens or hundreds of processors.One of the difficulties is a result of limited parallelism that can be exploited on a single level of loop nest.Although the current standard[8]allows one to use nested OpenMP parallel regions, the performance is not very satisfactory.One of the known issues with nested OpenMP is its lack of support for thread team reuse at the nesting level,which affects the overall application performance and will be more profound on multi-core,multi-chip architectures.There is no guarantee that the same OS threads will be used at each invocation of parallel regions although many OS and compilers have provided support for thread affinity at a single level.To remedy this deficiency,the NANOS compiler team[1]has introduced the GROUPS clause to the outer parallel region to specify a thread group composition prior to the start of nested parallel regions, and Zhang[13]proposed extensions for thread mapping and grouping.Chapman and collaborators[5]proposed the Subteam concept to improve work distribution by introducing subteams of threads within a single level of thread team,as an alternative for nested OpenMP.Conceptually,a subteam is similar to a process subgroup in the MPI context. The user has control over how threads are subdivided in order to suit application needs.The subteam proposal introduced an onthreads clause to a work-sharing directive so that the work, including the implicit barrier at the end of the construct,will be performed among the subset of threads within the team.One of the prominent extensions to the current OpenMP is the Workqueuing(or Taskq) modelfirst introduced by Shah et al.[9]and implemented in the Intel C++compiler[10]. It was designed to work with recursive algorithms and cases where work units can only be determined dynamically.Because of its dynamic nature,Taskq can also be used effectively in an unbalanced workload environment.The Taskq model will be included in the coming OpenMP 3.0release[4].Although thefinal tasking directive in OpenMP3.0will not be the same as the original Intel Taskq proposal,it should still be quite intuitive to understand what potential the more dynamic approach can offer to applications.In this study,we will compare different OpenMP approaches for the parallelization of a multi-zone application benchmark and report performance results from a large shared memory machine.In Section2,we briefly discuss the application under consideration.The different implementations of our benchmark code are described in Section3and the machine description and performance results are presented in Section4.We conclude our study in Section5where we also elaborate on future work.2Multi-Zone Application BenchmarkThe multi-zone application benchmarks were developed[6,11]as an extension to the origi-nal NAS Parallel Benchmarks(NPBs)[2].These benchmarks involve solving the application benchmarks BT,SP,and LU on collections of loosely coupled discretization meshes(or zones). The solutions on the meshes are updated independently,but after each time step they ex-change boundary value information.This strategy,which is common among many production structured-meshflow solver codes,provides relatively easy to exploit coarse-grain parallelism between zones.Since the individual application benchmark also allowsfine-grain parallelism within each zone,this NPB extension,named NPB Multi-Zone(NPB-MZ),is a good candidate for testing hybrid and multi-level parallelization tools and strategies.NPB-MZ contains three application benchmarks:BT-MZ,SP-MZ,and LU-MZ,with prob-lem sizes defined from Class S to Class F.The difference between classes comes from how the number of zones and the size of each zone are defined in each benchmark.We focus our study on the BT-MZ benchmark because it was designed to have uneven-sized zones,which allows us to test various load balancing strategies.For example,the Class B problem has64zones with sizes ranging from3K to60K mesh points.Previously,the hybrid MPI+OpenMP[6]and nested OpenMP[1]programming models have been used to exploit parallelism in NPB-MZ beyond a single level.These approaches will be briefly described in the next section.3Benchmark ImplementationsIn this section,we describefive approaches of using OpenMP and its extension to implement the BT-MZ benchmark.Three of the approaches exploit multi-level parallelism and the other two are concerned with balancing workload dynamically.3.1Hybrid MPI+OpenMPThe hybrid MPI+OpenMP implementation exploits two levels of parallelism in the multi-zone benchmark in which OpenMP is applied forfine grained intra-zone parallelization and MPI is used for coarse grained inter-zone parallelization.Load balancing in BT-MZ is based on a bin-packing algorithm with an additional adjustment from OpenMP threads[6].In this strategy,multiple zones are clustered into zone groups among which the computational workload is evenly distributed.Each zone group is then unqiuely assigned to each MPI process for parallel execution.The procedure involves sorting zones by size in descending order and bin-packing into zone groups.Exchanging boundary data within each time step requires MPI many-to-many communication.The hybrid version is fully described in Ref.[6]and is part of the standard NPB distribution.We will use the hybrid version as the baseline for comparison with other OpenMP implementations.3.2Nested OpenMPThe nested OpenMP implementation is based on the two-level approach of the hybrid version except that OpenMP is used for both levels of parallelism.The inner level parallelization for loop parallelism within each zone is essentially the same as that of the hybrid version.The only addition is the“num_threads”clause to each inner parallel region to specify the number of threads.Thefirst(outer)level OpenMP exploits coarse-grained parallelism between zones.A code sketch of the iteration loop using nested OpenMP is illustrated in Fig.1.The outer level parallelization is adopted from the MPI approach:workloads from zones are explicitly distributed among the outer-level threads.The difference is that OpenMP now works on the shared data space as opposed to private data in the MPI version.The load balancing is done statically through the same bin-packing algorithm where zones arefirst sorted by size,then assigned to the least loaded thread one by one.The routine“map_zones”returns the number and list of zones(num_proc_zones and proc_zone_id)assigned to a given thread(myid)as well as the number of threads(nthreads)for the inner parallel regions.This information is then passed to the“num_threads”clause in the solver routines.The MPI communication calls inside“exch_qbc”for boundary data exchange are replaced with direct memory copy and proper barrier synchronization.In order to reduce the fork-and-join overhead associated with the inner-level parallel regions, a variant was also created:a single parallel construct is applied to the time step loop block and all inner-parallel regions are replaced with orphaned“omp do”constructs.This version,namely version2,will be discussed together with thefirst version in the results section.!$omp parallel private(myid,...) myid=omp_get_thread_num()call mapproc zonethreads(nthreads)do k=2,nz-1solve for u in the current zone end do !$omp parallel private(myid,...) myid=omp_get_thread_num()call mapthread!$omp parallel private(zone,...) do step=1,nitercall exch_qbc(u,...)!$omp do schedule(runtime)do iz=1,num_zoneszone=zone_sort_id(iz)call adi(u(zone),...)end doend do!$omp end parallel #pragma omp parallel private(zone) for(step=1;step<=niter;step++){ exch_qbc(u,...);#pragma intel omp taskqfor(iz=0;iz<num_zones;iz++){zone=zone_sort_id[iz];#pragma intel omp task\captureprivate(zone)adi(&u[zone],...);}}Figure2:Code segment using OpenMP runtime scheduling(left)and Intel taskq directives (right).approach described in previous sections for handling load balance.The trade-offis potentially higher overhead associated with dynamic scheduling and less thread-data affinity as would be achieved in a static approach.To examine the potential performance trade-off,we developed an OpenMP version that solely focuses on the coarse-grained parallelization of different zones of the multi-zone benchmark.As illustrated in Fig.2left panel,this version is much simpler and compact.The“omp do”directive is applied to the loop nest over multiple zones.There is no explicit coding for load balancing,which is achieved through the OpenMP dynamic runtime schedule.The use of the“schedule(runtime)”clause allows us to compare different OpenMP loop schedules.A“zone_sort_id”array is used to store zone ids in different sorting schemes.3.5Workqueuing ModelWe developed a Taskq version of the BT-MZ benchmark based on the Intel workqueuing model. Because Intel implemented Taskq only in its C++compiler for C/C++applications and there is no other vendor compiler available at this point for testing the concept,we had tofirst convert the Fortran implementation of BT-MZ to the C counterpart.To minimize the performance impact from such a conversion,we did the following:•Fortran multi-dimensional arrays are converted to linearized C arrays,such as u(m,i,j,k)->u[m+5*(i+nxmax*(j+ny*k))],•The restrict qualifier is added to pointer variables in subroutine argument to enable compiler to perform optimization without pre-assumed pointer aliasing,for examplevoid add(double*restrict u,double*restrict rhs,..).Once we had the C version,the Taskq implementation of the BT-MZ benchmark(Fig.2right panel)is straightforward.Each work unit for a task is defined by the solver on an individual zone.The“intel omp taskq”directive is added to the loop nest over zones.Inside the zone loop nest,the“intel omp task”directive is used to generate tasks for each loop iteration and the zone value is preserved for each task by the“captureprivate”clause.The implicit synchronization at the end of the taskq construct guarantees the completion of all tasks beforegoing to the next iteration.Again,to test the performance impact of workload ordering,we use “zone_sort_id”to store the sorted zone ids.4Performance ResultsIn this section,we present performance results obtained on a large parallel system.We willfirst give a brief description of the system and programming support.4.1Testing EnvironmentOur performance studies were conducted on an SGI Altix3700BX2system that is one of the20 nodes in the Columbia supercomputer installed at NASA Ames Research Center[3].The Altix BX2node has512Intel Itanium2processors,each clocked at1.6GHz and containing9MB on-chip L3data cache.Approximately1TB of global shared-access memory is provided through the SGI scalable non-uniform memory accessflexible(NUMAflex)architecture.The underlying NUMAlink4interconnect provides6.4GB/s bandwidth.A single Linux operating system runs on the Altix system,providing an ideal environment for shared-memory programming such as OpenMP.The system is equipped with SGI message-passing toolkit(MPT1.12)that supports MPI programming.We used the Intel Fortran,C/C++9.1compilers for IA64that support OpenMP 2.5as well as the Taskq model.All of our experiments were run under the PBSpro batch system in a shared environment.In order to reduce variation in timing and improve performance,the “dplace”placement tool was used to bind processes/threads to physical processors.For testing the OpenMP Subteam concept as described in Section3.3,we employed the OpenUH research compiler[7]that was extended to support the“onthreads”clause.This is essentially a source-to-source translation process and the generated code is then compiled with a native compiler.A small runtime library was developed to support basic subteam functions, such as loop iteration scheduling and synchronization for subteam threads.4.2Multi-level ParallelismIn order to compare different multi-level parallel versions of the BT-MZ benchmark,wefirst examine the performance impact from varying the number of zone groups on a given number of CPUs.The left panel of Fig.3plots benchmark timing in seconds as a function of the number of groups at32CPUs for the Class B problem size.The notation“N g×N t”denotes the number of zone groups(N g)formed for thefirst level parallelism and the number of threads(N t)for the second level parallelism within each group.In the hybrid MPI+OpenMP version,N g is the same as the number of MPI processes and N t is the number of OpenMP threads per MPI process.N g in the nested OpenMP versions is the number of outer-level threads,and in the subteam version is the number of subteams.Overall the subteam version is very close in performance to the MPI+OpenMP hybrid ver-sion.This indicates that the data layout of the subteam version is very similar to that of the hybrid version,even though Subteam uses shared data arrays and MPI uses private data arrays. At single level parallelization,either N×1or1×N,the performance of three approaches is very0204060T i m e (s e c )32×116×28×44×82×161×32Number of CPUs (N g ×N t )2481632641282565121248163264128256Number of CPUsFigure 3:Timing comparison of nested OpenMP,Subteam,and MPI+OpenMP versions of BT-MZ for the Class B problem,on the left for a given number of CPUs and on the right as a function of CPU counts.The results are from the SGI Altix.close.Between the two ends,the nested-OpenMP v1performs consistently 30-80%worse than the other two versions.By reducing the number of inner-level parallel regions in the second version (v2),the performance of nested OpenMP improved substantially,although it still lags behind.The large overhead associated with the inner parallel regions is likely due to the inability of the OpenMP runtime library to reuse threads efficiently at the second level.Even though the dplace tool binds the first-level threads properly,it has no control over the second-level threads.This result is consistent with the previous observation by Ayguade et al.[1]The best performance is achieved by maximizing the number of zone groups as long as the workload can be balanced.For Class B,the optimal number of zone groups is 16.Beyond 16CPUs,multi-level parallelism is needed for additional performance gain.Both subteam and hybrid versions follow this analysis,but the nested OpenMP tends to prefer a larger number of threads at the outer level,especially when the total number of CPUs increases.The scaling results of BT-MZ from the best combinations of zone-groups and threads are summarized in right panel of Fig.3.Both the subteam and hybrid versions scale well up to the measured CPU counts.Up to 16CPUs,when only the outer-level parallelism is employed,the nested OpenMP versions performs similarly to the other two versions.Beyond 16CPUs,nested OpenMP suffers from large overhead associated with the second-level parallelism and becomes much worse at larger CPU counts.To understand better why the nested OpenMP codes suffer from performance degradation in the multi-level mode,we collected additional performance information from hardware coun-ters available on the Altix and the results from the 8×4runs are compared with the hybrid MPI+OpenMP runs in Fig.4.The nested OpenMP v1has the highest stalled cycles and L3cache misses,which is an indication of thread-data mismatch.Stalled cycle is usually a result of waiting on resources,in particular memory.Although the nested OpenMP v2reduced stalled cycles,but it has large L3cache misses.Three pure OpenMP codes have somewhat higher TLB misses;but on the Altix,a TLB miss has less impact on the overall performance.Other counters,such as L1and L2cache misses,have similar values for all four codes and are not included in the graph.0.00.51.01.52.02.5V a l u e R e l a t i v e t o H y b r i d s t a l l e d c y c l e s L 3 h i t r a t e L 3 m i s s r a t i o T L B m i s s m i s p r e d i c t e d b r a n c h e s c y c l e s w i t h n o i n s t r s w a l l c l o c k t i m eFigure 4:Comparison of hardware performancecounter results obtained on the Altix for the 8×4runs of the four BT-MZ versions.5101520G f l o p /s Figure 5:Performance comparison of dif-ferent schedule kinds for BT-MZ Class B,16threads.Numbers in the graph indi-cate chunk sizes.The last bar is for a static schedule without chunk size.4.3Unbalanced WorkloadTo test the effectiveness of OpenMP runtime schedule kinds and more dynamic approaches on unbalanced workload,we focus on the single-level OpenMP versions of BT-MZ as described in Sections 3.4and 3.5,which exploit parallelism among unbalanced zones.No nested parallelism is considered here.4.3.1Impact of Schedule KindThe results from 16-thread runs using different runtime schedule kinds and chunk sizes are shown in Fig.5.The “dynamic,1”schedule produces the best result for the given problem.As the chunk size increases,the performance decreases.The “guided ”schedule is only slightly worse.The “static ”schedule without chunk size (the last bar in the graph)shows its limita-tion in dealing with unbalanced workload and is as much as 50%worse than the “dynamic,1”schedule.The “static,1”(or cyclic)schedule improves the performance but not sufficiently.4.3.2Workload Ordering on PerformanceAs noted in the benchmark description,the zone workload in BT-MZ was designed to be uneven.Class B contains 64zones whose sizes,shown in Fig.6on the left,range from 3K to 60K mesh points.The right graph in Fig.6shows the performance impact of three different orderings of zones in size on the “dynamic,1”schedule:natural (original)order,descending order,and ascending order.For comparison,the graph also includes results from a single-level OpenMP version that uses the static bin-packing algorithm for load balancing.This version is essentially the same as the nested OpenMP v1described in Section 3.2without the nested parallelism.We observe that by sorting zones into descending order,the performance can improve by as much as 45%(18to 26Gflop/s on 16threads).This result supports the observation reported by Van Zee et al.[12]in their FLAME code using the workqueuing model.020K 40K 60K D a t a S i z e 0102030405060Zone Number248163264G f l o p /s 1248163264Number of ThreadsFigure 6:Performance impact of different workload orderings on the “dynamic ”schedule.Re-sults from the static bin-packing approach are included for comparison.The impact of different workload orderings on the “guided ”schedule (not shown in the graph)is very similar to that on the “dynamic ”schedule.It is worth noting that the dynamic approach for unbalanced workload is only slightly worse (∼15%beyond 16threads)than the static bin-packing approach.However,the programming effort in the former case is considerably less.4.3.3Workqueuing ModelBefore going into the workqueuing (or taskq)model,we first examine the performance change as a result of converting the code from Fortran to C.Due to pointer aliasing,a C code can suffer from the constraint in compiler optimization for pointers.In order to reduce or even eliminate pointer aliasing,one can either use the “restrict ”modifier or rely on compiler flags.The Intel compiler provides the option “-fno-alias ”for this purpose.Table 1summarizes the results of the OpenMP C version of BT-MZ using different compiler aliasing options and compares with the Fortran version.The no-alias option produces more than twice as much improvement in performance as the default aliasing bining with the “restrict ”modifier,the C code performs very close to the Fortran counterpart.This combined option was used in collecting the C results below.Table 1:Comparison of results from different aliasing options for the Class B,BT-MZ on 16threads.CaseGflop/s 49.86“restrict ”keyword16.175C 23.49-fno-alias Combination 25.718Fortran26.124Figure 7compares the Intel Taskq version of BT-MZ with the single-level OpenMP versions (both C and Fortran)using dynamic scheduling for load balancing.It is encouraging to note that the Taskq version has similar performance to the single-level OpenMP C version using the “dynamic,1”schedule up to 32threads.Only at 64threads the dynamic-schedule version out-performs the Taskq version by about 20%.As illustrated by the two panels in the figure,sorting workload into descending order improves overall performance for Taskq as paring to the Fortran version,the performance of Taskq gets worse at larger thread counts,primarily due to the difference between Fortran and C.248163264G f l o p /s 1248163264Number of Threads1248163264Number of ThreadsFigure 7:Performance comparison of the Taskq version with the single-level OpenMP versions (in both C and Fortran)using the “dynamic,1”schedule.5ConclusionWe have presented performance evaluation of four different OpenMP approaches in dealing with multi-level parallelism and unbalanced workload,and compared with a hybrid MPI+OpenMP method.The nested OpenMP approach suffered from performance degradation as a result of large overhead and lack of thread reuse when invoking the inner level parallelism on the SGI Altix.By minimizing the number of inner level parallel regions we improved nested OpenMP performance dramatically.Another potential way to reduce overhead associated with nested parallel regions is by proper support of thread affinity with thread reuse as proposed by others [1,13].The approach based on the Subteam extension to OpenMP overcame some of the limitations with nested OpenMP and showed promise in achieving performance close to that of the hybrid MPI+OpenMP method.Our study also points out the importance of extending the Subteam proposal to include API for subteam creation and management.It is very encouraging that the more dynamic approach provided by the workqueuing model showed great potential in dealing with unbalanced workload.This model can benefit from using a weight factor in scheduling tasks.For future work,we would like to conduct our experiments on more platforms,in particularto study the support of nested parallelism from different compilers and runtime systems.A natural extension is to investigate the performance characteristics of nested parallelism under the workqueuing model.It is also important to extend our experience from a single benchmark application to more realistic applications.AcknowledgmentsThe authors would like to acknowledge fruitful discussions with Johnny Chang,Robert Hood, and support from the staffat NAS division for many experiments conducted on the Columbia supercomputer.References[1]E.Ayguade,M.Gonzalez,X.Martorell,and G.Jost,“Employing Nested OpenMP for theParallelization of Multi-Zone Computational Fluid Dynamics Applications,”J.of Parallel and Distributed Computing,special issue,ed.B.Monien,Vol.66,No.5,2006,p686. [2]D.Bailey,J.Barton,sinksi,and H.Simon,“The NAS Parallel Benchmarks,”NASTechnical Report RNR-91-002,NASA Ames Research Center,1991.[3]R.Biswas,M.J.Djomehri,R.Hood,H.Jin,C.Kiris,and S.Saini,“An Application-BasedPerformance Characterization of the Columbia Supercluster,”in Proc.of SC05,Seattle, WA,November12-18,2005.[4]M.Bull,“OpenMP3.0Overview,”presented at the OpenMP BoF at the SC06conference,2006./futures/.[5]B.Chapman,L.Huang,H.Jin,G.Jost,and B.de Supinski,“Toward Enhancing OpenMP’sWork-Sharing Directives,”in the Euro-Par06Conference,Dresden,Germany,2006;LNCS 4128,2006,pp.645-654.[6]H.Jin and R.F.Van der Wijngaart,“Performance Characteristics of the Multi-Zone NASParallel Benchmarks,”J.of Parallel and Distributed Computing,special issue,ed.B.Monien,Vol.66,No.5,2006,p674.[7]OpenUH Research Compiler,/openuh.[8]The OpenMP Standard,/.[9]S.Shah,G.Haab,P.Petersen,and J.Throop,“Flexible Control Structure for Parallelismin OpenMP,”in the European Workshop on OpenMP(EWOMP99),1999.[10]E.Su,X.Tian,M.Girkar,G.Haab,S.Shah,and P.Petersen,“Compiler Support of theWorkqueuing Execution Model for Intel SMP Architectures,”in the European Workshop on OpenMP(EWOMP02),2002.[11]R.F.Van der Wijngaart and H.Jin,“The NAS Parallel Benchmarks,Multi-ZoneVersions,”NAS Technical Report NAS-03-010,NASA Ames Research Center,2003./Software/NPB/.[12]F.Van Zee,P.Bientinesi,T.M.Low,R.Van de Geijn,“Scalable Parallelization of FLAMECode via the Workquenuing Model,”submitted to ACM Trans.on Math.Software,2006.[13]G.Zhang,“Extending the OpenMP Standard for Thread Mapping and Grouping,”in theInternational Workshop on OpenMP(IWOMP06),Reims,France,2006.。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Abstract: Photovoltaic (PV) systems, especially the ones installed in urban and suburban areas, operate often under nonuniform distribution of solar irradiance and PV cells temperature over the PV array. At this purpose, an architecture of the PV system based on distributed DC–DC converters, that performs the maximum power point tracking algorithm (DMPPT) can effectively counteract the reduction of power efficiency because of electrical, thermal and irradiance mismatch phenomena. Aim of this study is to experimentally assess the effect of the use of DC–DC boost converters with MPPT capability, directly applied at the substring level of a single PV module. The novelty of the approach here proposed with respect to the state-of-the-art is that the outputs of the converters are connected in parallel. Comparing the power efficiency of a conventional PV module with that of the considered prototype where the three substrings are connected to a dedicated DC–DC converter, a remarkable improvement of the extracted power ranging from 11 to 25% under nonuniform solar radiation was found.
The cause may be the shadow of clouds or the shadow of one solar array on the other, trees, booms, neighbour’s houses etc. Moreover, in the framework of building integration, concentration PV systems are often used to increase the power density [9]. Moreover for these systems, shadowing and the non-uniform solar radiation distribution is a crucial problem if a double axis tracking system is not adopted. Actually, maximum power point tracking is a difficult operation when a PV array is partially shaded or works under non-uniform or rapidly changeable irradiance conditions, because two or more local maximum power points may appear [10]. Since in many cases these causes of efficiency losses cannot be removed, it is important to minimise the effect of such mismatching phenomena on the energy production [11]. To reduce the effects of these problems, distributed DC–DC converters that performs the maximum power point tracking algorithm (DMPPT) techniques were recently studied [12]. These approaches are based on the adoption of a DC–DC converter dedicated to the maximum power point tracking (MPPT) of each PV module or group of modules by eventually moving part of the electronics from theThere are a number of studies in the literature concerning distributed architectures [6, 14–25]. Moreover, a PV system can be used to increase the autonomy of a mobile systems, for instance electric vehicles or moving robots [26], in such applications the DMPPT approach was also applied [27, 28]. Using distributed PV architectures leads to higher costs compared with conventional approaches. Therefore a trade-off between costs and advantages in terms of output power of the PV system should be investigated for each application. Under partially shaded or non-uniform solar radiation conditions, the use of distributed power electronics can recover between 10 and 30% of annual performance loss or more depending on the system configuration and type of used devices [6, 29]. Of course, when the impact, in terms of duration and extension, of the aforementioned negative phenomena is limited, the use of distributed power electronics does not appear economic but in case of BIPV systems, characterised by shading conditions or low concentrating PV systems, it is expected that a DMPPT could lead to great advantages.
Alfio Dario Grasso 1, Salvatore Pennisi 1, Massimiliano Ragusa 2, Giuseppe Marco Tina 1 ✉, Cristina Ventura 1
Dipartimento di Ingegneria Elettrica, Elettronica e Informatica, University of Catania, Viale Andrea Doria n. 6, 95125, Catania, Italy STMicroelectronics, Catania, Italy ✉ E-mail: giuseppe.tina@dieei.unict.it
IET Renewable Power Generation Research Article
Performance evaluation of a multistring photovoltaic module with distributed DC–DC converters
1 2
ISSN 1752-1416 Received on 22nd July 2014 Revised on 6th March 2015 Accepted on 4th May 2015 doi: 10.1049/iet-rpg.2014.0246
1
Introduction
With the continuous fall in the price of photovoltaic modules, the rising concern on the greenhouse gas emissions and increasing costs for some generation alternatives, solar energy is rapidly becoming a competitive solution for the production of clean energy. As an example, large drops in solar module prices have helped spur record levels of deployment, which increased 54% over the previous year to 28.7 GW in 2011. This is ten times the new build level of 2007 [1]. With a global PV modules overcapacity that is less acute than one or two years ago, process has stabilised in 2013, and the return to profitability should allow companies to invest again [2]. Technologically, PV systems are relatively easy to install, very safe, almost maintenance free and more importantly, environment friendly. Moreover, a very important feature of PV systems is that, because of their modular structure, incremental power additions are easily accommodated. Therefore because of all these advantages and their medium and long term economic prospects, large PV power systems are being installed worldwide. Meanwhile, unused space, such as rooftops of homes, factories and large buildings can be effectively utilised to harvest solar energy as demonstrated by the success of the building integrated PV (BIPV) initiatives in various countries [3–5]. The PV power conversion efficiency depends on several causes, such as weather conditions, surrounding obstacles, geographic location, ageing and also design strategies; some of these factors can cause non-uniformity of irradiance and temperature on PV modules, which can play an important role in the whole efficiency. The problem of thermal mismatch has a minimal impact when cells and module are connected in series, on the contrary the irradiance mismatch has the greatest impact in this case [6]. One of the most significant causes of uneven distribution of solar radiation is the partially shading of PV arrays [7]. Although a planar PV array in an isolated landscape would be exposed to uniform illumination, the complex designs and landscapes of PV systems in urban and suburban environments often create situations where PV systems on buildings receive a non-uniform radiation, resulting in decreased power conversion efficiency [8].
相关文档
最新文档