Performance metrics and ontology for describing performance data of grid workflows

合集下载

老外的一份绩效管理英文版

Language Barriers: When dealing with foreign employees, language can be a significant barrier to effective communication If managers and employees do not share a common language, it can be difficult to clearly communicate expectations, goals, and feedback This can lead to misunderstandings and a lake of claim in performance evaluations
It involves setting clear performance standards, assessБайду номын сангаасng employee performance against these standards, providing feedback, and creating development plans to improve performance
Link rewards to performance
01
Ensure that rewards and incentives are closely linked to individual performance and organizational goals
Recognition programs
Feedback and Recognition
Provide feedback on performance and recognize outstanding achievements

OpenIoT-OpenSourceInternetofThings in the Cloud-Paper

OpenIoT:Open Source Internet-of-Thingsin the CloudJohn Soldatos1,Nikos Kefalakis1,Manfred Hauswirth2, Martin Serrano2,Jean-Paul Calbimonte3,Mehdi Riahi3,Karl Aberer3,Prem Prakash Jayaraman4,Arkady Zaslavsky4,Ivana PodnarŽarko5(&),Lea Skorin-Kapov5,and Reinhard Herzog61Athens Information Technology,0.8Km Markopoulo Ave.,P.O.Box68,19002Peania,Greece{jsol,nkef}@ait.gr2INSIGHT@National University of Ireland,Galway,IDA Business Park,Lower Dangan,Galway,Ireland{manfred.hauswirth,serrano}@3EPFL IC LSIR,École Polytechnique Fédérale de Lausanne,Station14,1015Lausanne,Switzerland{jean-paul.calbimonte,mehdi.riahi,karl.aberer}@epﬂ.ch 4CSIRO Digital Productivity Flagship,Building108North Road,Acton,Canberra2617,Australia{prem.jayaraman,arkady.zaslavsky}@csiro.au 5Faculty of Electrical Engineering and Computing,University of Zagreb,Unska3,10000Zagreb,Croatia{ivana.podnar,lea.skorin-kapov}@fer.hr6Fraunhofer IOSB,Fraunhoferstr.1,76131Karlsruhe,GermanyReinhard.Herzog@iosb.fraunhofer.deAbstract.Despite the proliferation of Internet-of-Things(IoT)platforms forbuilding and deploying IoT applications in the cloud,there is still no easy wayto integrate heterogeneous geographically and administratively dispersed sen-sors and IoT services in a semantically interoperable fashion.In this paper weprovide an overview of the OpenIoT project,which has developed and providedaﬁrst-of-kind open source IoT platform enabling the semantic interoperabilityof IoT services in the cloud.At the heart of OpenIoT lies the W3C SemanticSensor Networks(SSN)ontology,which provides a common standards-basedmodel for representing physical and virtual sensors.OpenIoT includes alsosensor middleware that eases the collection of data from virtually any sensor,while at the same time ensuring their proper semantic annotation.Furthermore,it offers a wide range of visual tools that enable the development and deploy-ment of IoT applications with almost zero programming.Another key feature ofOpenIoT is its ability to handle mobile sensors,thereby enabling the emergingwave of mobile crowd sensing applications.OpenIoT is currently supported byan active community of IoT researchers,while being extensively used for thedevelopment of IoT applications in areas where semantic interoperability is amajor concern.Keywords:Internet-of-ThingsÁOpen sourceÁSemantic interoperability©Springer International Publishing Switzerland2015I.PodnarŽarko et al.(Eds.):FP7OpenIoT Project Workshop2014,LNCS9001,pp.13–25,2015. DOI:10.1007/978-3-319-16546-2_314J.Soldatos et al.1IntroductionWe are nowadays witnessing the convergence of the Internet-of-Things(IoT)and the cloud computing paradigms,which is largely motivated by the need of IoT applications to leverage the scalability,performance and pay-as-you-go capabilities of the cloud. During recent years several efforts towards IoT/cloud convergence have been undertaken both in the research community(e.g.,[1])and in the enterprise(e.g.,).A common characteristic of these efforts is their ability to stream data to the cloud in a scalable and high performance way,while at the same time providing the means for managing applications and data streams.Nevertheless,these architectures do not essentially provide semantic interoperability[2]across IoT applications which have been developed/deployed independently from each other.Therefore,there is still no easy way to combine data streams and services from diverse IoT applications that feature incom-patible semantics(e.g.,units of measurement,raw sensor values and points of interest).This paper presents an overview of the FP7-287305OpenIoT project(co-funded by the European Commission),which has provided a middleware platform enabling the semantic uniﬁcation of diverse IoT applications in the cloud.OpenIoT uses the W3C Semantic Sensor Networks(SSN)ontology[3]as a common standards-based model for semantic uniﬁcation of diverse IoT systems.OpenIoT offers a versatile infrastructure for collecting and semantically annotating data from virtually any sensor available.OpenIoT exploits also the Linked Data concept[4]towards linking related sensor data sets.Furthermore,OpenIoT provides functionalities for dynamicallyﬁl-tering and selecting data streams,as well as for dealing with mobile sensors.It comes with a wide range of visual tools,which enable the development of cloud based IoT applications through minimal programming effort.OpenIoT is currently available as an open source project(https:/// OpenIotOrg/openiot/).As of June2014,it consists of nearly400.000lines of code, while it also integrates libraries of the popular Global Sensor Networks(GSN)open source project[5].Recently,OpenIoT received an award from Black Duck,as being one of the top ten open source project that emerged in2013[6].The rest of the paper is devoted to the presentation of the main technical developments of the project.The structure of the paper is as follows:Sect.2provides an overview of the OpenIoT platform,including an illustration of its architecture.Section3is devoted to the pre-sentation of the main functionalities of the platform and how they can be used towards developing IoT applications.Section4provides an overview of real-life IoT applica-tions,which have been developed based on OpenIoT.Section5concludes the paper. 2OpenIoT Platform Overview2.1Achitecture OverviewThe OpenIoT architecture comprises seven main elements[7]as depicted in Fig.1.•The Sensor Middleware(Extended Global Sensor Networks,X-GSN)collects,ﬁlters and combines data streams from virtual sensors or physical devices.The Sensor Middleware is deployed on the basis of one or more distributed instances(nodes),which may belong to different administrative entities.The OpenIoT pro-totype implementation uses X-GSN (Extended GSN),an extended version of the GSN middleware [5].Furthermore,a mobile broker (publish/subscribe middleware)is used for the integration of mobile sensors.•The Cloud Data Storage (Linked Stream Middleware Light,LSM-Light)acts as a cloud database which enables storage of data streams stemming from the sensor middleware.The cloud infrastructure stores also metadata required for the operation of OpenIoT.The OpenIoT prototype implementation uses the Linked Stream Middleware (LSM)[8],which has been re-designed with push-pull data function-ality and cloud interfaces.•The Scheduler processes requests for on-demand deployment of services and ensures their proper access to the resources (e.g.data streams)that they require.It discovers sensors and associated data streams that can contribute to a given ser-vice.It also manages a service and activates the resources involved in itsprovision.Fig.1.Overview of OpenIoT Architecture and Main ComponentsOpenIoT:Open Source Internet-of-Things in the Cloud 1516J.Soldatos et al.•The Service Delivery&Utility Manager(SD&UM)combines data streams as indicated by service workﬂows within the OpenIoT system in order to deliver the requested service(typically expressed as an SPARQL query).The SD&UM acts also as a service metering facility which keeps track of utility metrics for each service.•The Request Deﬁnition component enables on-the-ﬂy speciﬁcation of service requests to the OpenIoT platform.It comprises a set of services for specifying and formulating such requests,while also submitting them to the Scheduler.This component is supported by a GUI(Graphical User Interface).•The Request Presentation component is in charge of the visualization of the outputs of a service.This component selects mash-ups from an appropriate library in order to facilitate service presentation.•The Conﬁguration and Monitoring component enables visual management and conﬁguration of functionalities over sensors and services that are deployed within the OpenIoT platform.2.2OpenIoT Ontology for Semantic Interoperability and LinkedData IntegrationThe OpenIoT ontology represents a universally adopted terminology for the conver-gence of sensed data with the semantic web.It enhances existing vocabularies for sensors and Internet Connected Objects(ICOs),with additional concepts relevant to IoT/cloud integration such as terms to annotate units of measurement,raw sensor values and points of interest at some speciﬁc levels of granularity.In particular,the OpenIoT ontology is extending the W3C SSN ontology,which supports the description of the physical and processing structure of sensors.Sensors are not constrained to physical sensing devices:rather a sensor is anything that can estimate/calculate the value of a phenomenon.Thus,either a device or computational process or a combi-nation of them could play the role of a sensor.The representation of a sensor in the ontology links together what it measures(the domain phenomena),the physical sensor (the device)and its functions and processing(the models).The OpenoT ontology is available as a single OWLﬁle,and provides the means for a semi-automatically generated documentation.Additional annotations have been added to split the ontology into thematic modules.The implementation of the ontology and its integration in the OpenIoT architecture are realized through the LSM middle-ware.LSM transforms the data from virtual sensors into Linked Data stored in RDF (Resource Description Format),which is de facto queried using SPARQL.In the context of IoT applications in general and LSM in particular,such queries refer typi-cally to sensor metadata and historical sensor readings.The SPARQL endpoint of LSM provides the interface to issue these types of queries.The RDF triple store deployed by LSM is based on OpenLink Virtuoso and provides a Linked Data query processor that supports the SPARQL1.1standard.While SPARQL queries are executed once over the entire collection and discarded after the results are produced,queries over LinkedOpenIoT:Open Source Internet-of-Things in the Cloud17 Stream Data are continuous.Continuous queries areﬁrst registered in the system,and continuously executed as new data arrives,with new results being output as soon as they are produced.LSM provides a wide range of interfaces(wrappers)for accessing sensor readings such as physical connections,middleware APIs,and database con-nections.Each wrapper is pluggable at runtime so that wrappers can be developed to connect new types of sensors into a live system when the system is running.The wrappers output the data in a uniﬁed format,following the data layout described in the OpenIoT ontology.2.3Mobile Broker and Publish/Subscribe MiddlewareOpenIoT offers support for discovering and collecting data from mobile sensors(e.g., wearable sensors,sensors built-in mobile devices).This is achieved through a publish/ subscribe middleware titled CloUd-based Publish/Subscribe middleware for the IoT (CUPUS)which integrates:(1)A cloud-based processing engine for sensor data streams based on the publish/subscribe principles and(2)A mobile broker running on mobile devices forﬂexible data acquisition from mobile ICOs.In the OpenIoT architecture,CUPUS interfaces to the Cloud Database via X-GSN which annotates the data collected from mobile devices.Hence,data streams from mobile ICOs are annotated and stored in the OpenIoT cloud via X-GSN,similar to the way data streams from stationary sensors are announced via the X-GSN sensor middleware.CUPUS supports content-based publish/subscribe processing,i.e.,stateless Bool-ean subscriptions with an expressive set of operators for the most common data types (relational and set operators,preﬁx and sufﬁx operators on strings,and the SQL BETWEEN operator),and continuous top-k processing over sliding windows i.e.a novel publish/subscribe operator which identiﬁes k best-ranked data objects with respect to a given scoring function over a sliding window[9].It facilitates pre-ﬁltering of sensor data streams close to data sources,so that only data objects of interest,value and relevance to users are pushed into the cloud.Theﬁltering process is not guided locally on mobile devices,but rather from the cloud based on global requirements. Moreover,CUPUS distributes in near real-time push-based notiﬁcations from the cloud to largely distributed destinations,e.g.,mobile devices,based on user information needs.As depicted in Fig.2,a Mobile Broker(MB)running on a mobile device can connect to and disconnect from a publish/subscribe processing engine running within the cloud.On the one hand,a device with attached sensors acts as a data source:The MB announces the type of data it is able to contribute to the platform and adds the sensor to the Cloud Data Storage.On the other hand,mobile phone users can deﬁne continuous requests for data in the form of subscriptions.Based on existing requests for sensor data expressed through subscriptions by either mobile device users or the OpenIoT platform,the MB receives subscriptions from the publish/subscribe pro-cessing engine which become dataﬁlters to prevent potential data overload within the cloud.This mechanism ensures that only relevant data is transmitted from mobile18J.Soldatos et al.2.High-level OpenIoT Publish/Subscribe Architecturedevices into the platform to be annotated and stored in the RDF repository,and sub-sequently to be transmitted in near real-time to adequate mobile devices.Since the load of the publish/subscribe processing engine is generated by a varying number of publishers and subscribers with changing joint publication rate,the engine offers elastic real-time computation.It processes many subscriptions in parallel,which minimizes the processing overhead and optimizes the usage of cloud resources under varying load.3OpenIoT Platform Capabilities3.1Sensors and Data Streams Registration,Deployment and Discovery OpenIoT manages the registration,data acquisition and deployment of sensors and interconnected objects,through X-GSN.X-GSN is an extension of the GSN that supports semantic annotation of both sensor data and metadata.The core fundamental concept in X-GSN is the virtual sensor,which can represent not only physical devices, but in general any abstract or concrete entity that observes features of any kind.A virtual sensor can also be an aggregation or computation over other virtual sensors, or even represent a mathematical model of a sensing environment.In order to propagate its data to the rest of the OpenIoT platform,each virtual sensor needs to register within the LSM,so that other applications and users can discover them and get access to their data.The sensor is registered through X-GSN by posting a semantically annotated representation of its metadata.In order to associate metadata with a virtual sensor,a simple metadata descriptor is used.X-GSN takes care of creating the semantic annotations in RDF,according to the OpenIoT ontology,and posting them to the LSM cloud store repository.OpenIoT:Open Source Internet-of-Things in the Cloud19 Listing1illustrates the descriptor of a virtual sensor,which contains the location and theﬁelds exposed by the virtual sensor.The descriptor includes the mapping between a sensorﬁeld(e.g.,airtemperature)and the corresponding high-level concept deﬁned in the ontology(e.g.the URI http://lsm.deri.ie/OpenIoT/AirTemperature).After the sensor has been registered,the corresponding RDF triples(Listing2)are stored in LSM,and the sensor is available for discovery and querying from the upper layers of the OpenIoT architecture.Data acquisition for each virtual sensor is achieved based on wrappers that collect data through serial port communication,UDPconnections,HTTP requests,JDBC database queries,and more.X-GSN implements wrappers for these data providers,and allows users to develop custom ones.Virtual sensors and wrapper settings are speci ﬁed in con ﬁguration ﬁles,which provide internal details of the data to be exposed.Data are represented as streams of data tuples which can be consumed,queried or analyzed on-line.In OpenIoT this processing includes the annotation of sensor observations as soon as they arrive to X-GSN,as depicted in Fig.3.Note that virtual sensors can be built on top of other virtual sensors,providing different layers of information.For example,one can imagine a set of thermometers that send their data into X-GSN.Then all those data streams can feed an aggregating virtual sensor that averages received values over prede ﬁned time windows,annotates average values semantically and stores them in the LSM cloud store.The described example is realized by editing only a few XML ﬁles.In general,the effort needed to deploy a new sensor in OpenIoT is typically in the range of few man-hours.3.2Authenticated and Authorized Access to ResourcesThe diversity of applications interacting in an IoT ecosystem calls for non-trivial security and access-rights schemes.Conventional approaches (e.g.,creating distinct user accounts for each application and granting access rights to each user)are not scalable as the number of applications and user accounts grows.OpenIoT adopts a ﬂexible and generic approach for authentication and er management,authentication,and authorization are performed by the privacy &security module and its CAS (Central Authentication Service)ers are redirected to a centric login page the ﬁrst time they try to access a restricted resource where they provide their username and password to the central authentication entity.If authentication is successful,the CAS redirects the user to the original web page and returns a token to the web application.Tokens represent authen-ticated users,have a prede ﬁned expiration time and are valid only before they expire.The token is forwarded from a service to the next one in a request chain,e.g.,from the user interface to LSM.Services can check if the token is valid,or use the token to check if the user represented by this token has the necessary access rights.In terms of implementation,OAuth2.0enabled Jasig CAS has been extended for the OpenIoT needs.In particular,we added the end point permissions forretrievingFig.3.Semantic annotation of observations in X-GSN20J.Soldatos et al.authorization information from CAS.Authorization information includes user roles/permissions.Permissions are textual values that de ﬁne actions or behaviors and are de ﬁned per service.A wildcard permission format (Apache Shiro)is used.Permissions can consist of multiple levels delimited by colons,and levels can be de ﬁned by each application following a prede ﬁned pattern.For example,the permission string “admin:delete_role:SERVICE_NAME ”has three levels:“admin ”means that the permission is for administrative tasks,“delete_role ”is the action,and “SERVICE_NAME ”is the name of the service for which the action is permitted.3.3Zero-Programming Application DevelopmentOpenIoT provides an integrated environment (i.e.OpenIoT IDE (Integrated Devel-opment Environment))for building/deploying and managing IoT applications.Open-IoT IDE comprises a range of visual tools (Fig.4)enabling:(a)Visual de ﬁnition of IoT services in a way that obviates the need to master the details of the SPARQL language;(b)Visual discovery of sensors according to their location and type;(c)Con ﬁguration of sensor metadata as needed for their integration within the X-GSN middleware;(d)Monitoring of the status of the various IoT services,including the volumes of data that they produce and the status of the sensors that they comprise;(e)Visualization of IoT services on the basis of Web2.0mashups (i.e.maps,line/bar charts,dashboards and more).These tools accelerate the process of developing IoT applications.In several cases simple applications can be developed with virtually zeroprogramming.Fig.4.Overview of the OpenIoT Integrated Development Environment (OpenIoT IDE)OpenIoT:Open Source Internet-of-Things in the Cloud 2122J.Soldatos et al.3.4Handling of Mobility with Quality Driven Sensor ManagementAs mobile crowd sensing applications generate large volumes of data with varying sensing coverage and density,there is a need to offer mobility management of ICOs and quality-driven mobile sensor data collection to satisfy global sensing coverage requirements while taking into account data redundancy and varying sensor accu-racy[10].CUPUS provides the means for collecting data from mobile ICOs,whose geographical location potentially changes while providing data to the cloud.As mobile brokers running on mobile devices announce the type of data that can be provided by their currently available publishers,they are conﬁgured so as to announce their available data sources each time they enter a new geographic area.Moreover,an X-GSN virtual sensor is created on demand for each new geographic area and is used to both push and annotate the data generated by all mobile sensors currently residing within its geographical area.CUPUS addresses quality requirements(e.g.,energy efﬁciency,sensing data quality,network resource consumption,latency),through smart data acquisition mechanisms.Firstly,by deploying mobile brokers on mobile devices,data can be selectively collected from external data sources attached to the mobile device and transmitted to the cloud only when required.Mobile brokers running in geographical areas where there are no currently active subscriptions will suppress data collection and refrain from sending unnecessary data into the cloud.Secondly,CUPUS is inte-grated with a centralized quality-driven sensor management function,designed to manage and acquire sensor readings to satisfy global sensing coverage requirements, while obviating redundant sensor activity and consequently reducing overall system energy consumption.Assuming redundant data sources in a certain geographic area, a decision-making engine is invoked to determine an optimal subset of sensors which to keep active in order to meet data requests while considering parameters such as sensor accuracy,trustworthiness,and battery level.4Proof-of-Concept Applications4.1Phenonet ExperimentPhenonet uses sensor networks to gather environmental data for crop variety trials at a far higher resolution than conventional methods and provides high performance real-time online data analysis platform that allows scientists and farmers to visualize,process and extract both real time and long-term crop performance information from the acquired sensor measurements.Figure5provides an example of a Phenonet experiment with two types of sensors(1)Gypsum block soil moisture sensors(GBHeavy)at various depths(e.g.,20,30and40cm)and(2)Canopy temperature measurement sensor.The goal of the experiment is to monitor the growth and yield of a speciﬁc variety of crop by analyzing the impact of root activity,water use(soil moisture)and rmation about crop growth obtained in real time effectively helps plant scientist researchers to provide estimates on the potential yield of a variety.OpenIoT facilitates the processes of real-time data collection,on-the-ﬂy annotation of sensed data,data cleaning,data discovery,storage and visualization.4.2Urban Crowdsensing ApplicationThis is a mobile application for community sensing where sensors and mobile devices jointly collect and share data of interest to observe and measure air quality in real-time.Volunteers carrying wearable air quality sensors contribute sensed data to the OpenIoT platform while moving through the city.Citizens are able to consume air quality information of interest to observe it typically in their close vicinity.Figure 6shows air quality sensors measuring temperature,humidity,pressure,CO,NO2and SO2levels which communicate with the mobile application running on an Android phone via a Bluetooth ers can declare interest to receive environmental data (e.g.,temperature,CO levels)in their close vicinity and in near real-time.Moreover,they can express interest to receive the readings portraying poorest air quality for an area over time,or average readings within speci ﬁc areas as soon as they areavailable.Fig.5.Phenonet ExperimentIllustrationFig.6.Air Quality Sensors and Mobile ApplicationOpenIoT:Open Source Internet-of-Things in the Cloud 2324J.Soldatos et al.4.3Smart Campus ApplicationThe smart campus application brings information about interactions among people and things within typical campus situations into one Common Information Model(CIM). This model combines observations from sensors with mobile applications and static structural information into one cyber-physical context managed by OpenIoT.In the prototype the used sensors are QR-code or NFC based scanners to detect and conﬁrm the presence of persons and to identify assets and topics.The mobile applications are used for booking workplaces and for discussions.The structural information describes campus assets like buildings,rooms and workplaces,as well as teaching material. OpenIoT supports the stream oriented processing of events as well as context reasoning on the CIM.5ConclusionsOpenIoT has provided an innovative platform for IoT/cloud convergence which enables:(a)Integration of IoT data and applications within cloud computing infra-structures;(b)Deployment of and secure access to semantically interoperable appli-cations;(c)Handling of mobile sensors and associated QoS parameters.The semantic interoperability functionalities of OpenIoT are a key differentiating factor of the project when compared to the wide range of other IoT/cloud platforms.These functionalities provide a basis for the development of novel applications in the areas of smart cities and mobile crowd sensing,while also enabling large scale IoT experimentation. Acknowledgments.This work has been carried out in the scope of the OpenIoT project(http:// openiot.eu).The authors acknowledge contributions from all partners of the project.References1.Hassan,M.M.,Song,B.,Huh,E.-N.:A framework of sensor-cloud integration opportunitiesand challenges.In:Proceedings of the3rd International Conference on Ubiquitous Information Management and Communication(ICUIMC2009),pp.618–626.ACM, New York(2009)2.Blair,G.S.,Paolucci,M.,Grace,P.,Georgantas,N.:Interoperability in complex distributedsystems.In:Bernardo,M.,Issarny,V.(eds.)SFM2011.LNCS,vol.6659,pp.1–26.Springer,Heidelberg(2011)pton,M.,et al.:The SSN ontology of the W3C semantic sensor network incubatorgroup.J.Web Semant.17,25–32(2012).Elsevier4.Heath,T.:Linked data-welcome to the data network.IEEE Internet Comput.15(6),70–73(2011).IEEE Press,New York5.Aberer,K.,Hauswirth,M.,Salehi,A.:Infrastructure for data processing in large-scaleinterconnected sensor networks.In:International Conference on Mobile Data Management (MDM2007),pp.198–205.IEEE Press,New York(2007)6.Black Duck Software:Black Duck Announces Open Source Rookies of the Year Winners,Press Release,January2013OpenIoT:Open Source Internet-of-Things in the Cloud257.Serrano,M.,Hauswirth,M.,Soldatos,J.:Design principles for utility-driven services andcloud-based computing modelling for the internet of things.Int.J.Web Grid Serv.10(2–3),139–167(2014).Inderscience Publishers,Geneva8.Le Phuoc,D.,Nguyen-Mau,H.Q.,Parreira,J.X.,Hauswirth,M.:A middleware frameworkfor scalable management of linked streams.J.Web Semant.16,42–51(2012).Elsevier9.Pripužić,K.,PodnarŽarko,I.,Aberer,K.:Top-k/w publish/subscribe:a publish/subscribemodel for continuous top-k processing over data streams.Inf.Syst.39,256–276(2014).Elsevier10.Antonić,A.,Rožanković,K.,Marjanović,M.,Pripužić,K.,PodnarŽarko,I.:A mobilecrowdsensing ecosystem enabled by a cloud-based publish/subscribe middleware.In:2014International Conference on Future Internet of Things and Cloud(FiCloud),pp.107–114.IEEE Press,New York(2014)The author has requested enhancement of the downloaded file. All in-text references underlined in blue are link。

Cognos Metrics Manager用户手册说明书

MANAGE WHAT MATTERSWith Cognos Metrics Manager, organizations can deliver goal-driven metrics to every desktop. Users can monitor, analyze, and report on time-critical information through the creation, management, presentation, and delivery of cross-functional metrics.Cognos Metrics Manager can model and track any set of performance indicators for Global 3500 enterprises. It links an organization’s goals with employee performance and accountability.With Cognos Metrics Manager, users are self-sufficient, and can readily see how the business is progressing against its strategy. They can set priorities for their own actions. They can understand how their decisions affect the company’s performance.This scorecarding approach enables organizations to mon-itor their business by tracking more than just financial measurements. Key performance indicators (KPIs) such as employee satisfaction, supplier scorecards, and customer profitability, can be integrated into your management pic-ture. These indicators draw on a broad range of data sources from many different areas of your business.Cognos Metrics Manager delivers the single version of the truth that enterprises need to make decisions with confidence. It also provides the direct link to BI that helps users get at what’s driving their performance. Cognos Metrics Manager is tightly integrated with Cognos Series 7 BI to provide the rich environment and extensive range of Cognos functionality, including dash-boards, reports, and analysis.Cognos Metrics Manager works with Cognos Planning Series. Organizations can provide visibility into plan versus actual performance and communicate goal-driv-en planning metrics to thousands of employees across the enterprise. Cognos Metrics Manager can draw data from anywhere in your organization, in any format.This integration creates a collaborative decision-making environment for business users. It enables the effective sharing and rapid distribution of information tied to key corporate performance management indicators.KEY BUSINESS BENEFITSFlexible, open methodology supportThe flexibility of the software lets you model metrics and their relationship to each other based on any standard or proprietary scorecarding and management methodology you already use.Summary views from an easy-to-use interfaceCognos Metrics Manager displays summary views based on the Balanced Scorecard and other methodologies, and users can easily drill down to detailed metrics results and history.Scorecard viewSCORECARDING WITH COGNOSAnalyze issues to the required depthThe rich Cognos scorecarding environment, through integration with Cognos reporting and analysis capabil-ities, lets you analyze performance issues to understand what drives a metric’s change in order to make better decisions. Choose between a reportlet or a live custom URL for further information or greater context. URLs can point toward live dashboards, BI reports, SCM or CRM systems, or Web sites.Flexible information deliveryUsers can be notified by email at their desktop or on their PDA when a metric changes status. They can show or hide information and add custom benchmarks to their own personal scorecards.Detailed view of accountability and performance Managers can view metrics by their owners for a better view of individuals’ performance. Three- or five-state status indicators provide more granularity and show when performance is approaching or slightly exceeding targets.Work online or offlineView your scorecards from any Web browser or gener-ate PDF files that you can print and share in hard copy.Combined views of disparate measuresRecognizing that many users require cross-functional metrics, Cognos Metrics Manager uniquely consolidates disparate measures from different functional areas and data sources.KEY TECHNICAL FEATURESMetrics definition from disparate data sourcesCognos Metrics Manager can import data from any source. Any vendor’s data sources, relational databases, Excel files, flat files, user-entered values, and Cognos business intelligence sources can provide data to populate and support your scorecard. A Data Loader Utility is provided to automate the loading of data from flat files.Easy to deployZero-footprint architecture means users and adminis-trators do not require software installed on their desktop, making global intranet or extranet use and deployment easier.Simplified metrics creationAdministrators can define a metric once for use in any scorecard in their organization. Centrally defined KPIs ensure a consistent version of the truth and priority for all users. Administrator menus let you define all aspects of a metric: threshold ranges, benchmarks, data source definition, contact names, and URL links for contextual information in any format.Flexible metrics buildingCognos Metrics Manager lets you control the specific metrics or KPIs you wish to track, how they are com-bined, what constitutes good and poor performance for each indicator, and links to supporting reports.Central metrics storeCognos Metrics Manager historical data, as well as score-card, diagrams, and metric definitions are maintained in an industry-standard relational database.COGNOS METRICS MANAGERAlign metrics with strategy and initiativesIntegrated AnalysisCognos product interoperabilityCognos Metrics Manager is interoperable with the Cognos Series 7 enterprise-scale BI framework andfull suite of BI tools, and is integrated with Cognos Web Services.Application access and securityIntegration with the Cognos access and security founda-tion means users are easily assigned to classes, which determine the scorecards and metrics they can access through a simple and single sign-on dialogue box.Embedded business analysis toolsEnd user tools simplify cross-impact analysis of metrics.Integration with multidimensional analysis tools let users uncover the root cause of performance issues.Metrics relationship auto-diagramsAutomatically generated HTML displays of the relation-ship between metrics visually guide analysis to the root of performance problems.Balanced Scorecard Collaborative-certifiedCertified by the Balanced Scorecard Collaborative,Cognos Metrics Manager includes Strategy Maps and other Kaplan & Norton Balanced Scorecard best practices that can be easily and quickly modeled though menu-based administrator functions.Dynamic diagramsDynamic diagrams give users the capability to view metrics in context overlaid on images, such as JPEG and GIF files (e.g., revenue maps, corporate strategy maps,floor diagram or process maps). These diagrams can be easily created using the drag and drop administrationinterface to place metrics on top of pre-defined graphical images.Utilization monitoring and analysisCognos Metrics Manager can even track its own performance. It allows administrators to analyze how the application is being utilized throughout the organization.ScalabilityCognos Metrics Manager was built to handle the data vol-umes and user requirements of Fortune 1000 companies.TECHNICAL SPECIFICATIONSCognos Metrics Manager is built on industry-leading enterprise architectures including:Multiple Operating Systems (Microsoft Windows,Sun Solaris, HP-UX, IBM AiX),•Multiple Application Servers (Sun One 7, Apache Tomcat 4.1.18, IBM Websphere, 5, BEA Weblogic 7.0.1 and 8.1)Enterprise RDBMS Repository (MS-SQL Server,Oracle),J2EE Application Servers,Browser IE V5.5 and higher (for users or administra-tors), or Netscape 6.2 (for users).COGNOS METRICS MANAGERImpact analysis diagramView Metrics by ownerCOGNOS SERIES 7:BI FOR THEINTELLIGENT ENTERPRISECognos Series 7 suite of business intelligence (BI) com-ponents delivers the information framework that lets anyone in your organization understand your corporate performance—from any angle and to any depth. It lets you communicate the results—whether business as usual or breakthrough insight. It can even notify you if a critical event occurs, wherever you may be.Build a solution using Cognos Series 7 components or purchase a pre-built solution. Either way, ease of use and deployment, combined with user self-service, means you’re up and running quickly and seeing ROI faster than you imagined. All Cognos Series 7 components feature:Centralized security:A single LDAP-based component delivers seamless, secure access for all users inside and outside the firewall in conjunction with your own user authentication and security systems. IT can manage user profiles and classes for all Cognos components through centralized SSL security.Zero-footprint web deployment:Users access infor-mation in pure html using their web browser, making Java applets or proprietary plug-ins unnecessary. Share strategic information over a secure extranet with partners, suppliers, customers, and a mobile sales force with the bandwidth-saving benefit of a stateless environment.Scalability:Cognos Series 7 has been tested and proven scalable to tens of thousands of users through load balancing and a distributed architecture supporting Unix and Windows.Central administration:Manage all BI applications from one central console, or distribute management among departments or locations. Control request processing at the server, application, PowerCube, and report level. Monitor throughput across multiple servers at a glance and fine-tune multi-server environments. The platform-independent, remote Java administrator program lets you administer any server from any machine.Shared metadata:Create and manage all BI metadata and business rules in a single metadata model based on shared dimensions. This model provides a consistent enterprise data view with minimum development effort. The SQL-based environment ensures the metadata is optimized for each BI application.Extensible BI:Using our open API based on a web services architecture, IT can integrate BI directly into third party portals or specialized web applications. The software’s shared data foundation and integrated, modular approach means it can be easily implemented in steps across the entire enterprise. It can grow as the business grows, while reusing existing technology investments to save on costs.Local language, local currency:Localized product versions let users work in their own language and view figures in their own currency, using regional settings—all from a single application serverWHY COGNOS?Only Cognos delivers a complete range of integrated, scalable software for corporate performance manage-ment. Cognos products let organizations drive perform-ance with enterprise planning and budgeting, monitor it with scorecarding, and understand it with business intel-ligence reporting and analysis. Founded in 1969, Cognos now serves more than 22,000 customers in over 135 countries.Stock NO.:035015(04/03)Cognos,and the Cognos logo are trademarks or registered trademarks of Cognos Incorporated in the United States and/or other countries.Allother names are trademarks or registered of their respective companies.。

高级计算机系统结构第三章

17
4) Utilization
It is the ratio of the achieved speed to the peak speed of a given computer A sequential application executing on a single MPP processor has a utilization ranging from 5%-40%, typical 8%-35% A parallel application executing on multiple processors has a utilization ranging from 1%35%, typical 4%-20% Some benchmark can reach higher utilization, for example : ASCI White Pacific IBM SP POWER3(375MHz) U = 7.226/12.3 = 58.7 %, NEC Earth Simulator can reach U = 35.8/40.96 = 87.4%
4
(2) According to macro or micro:
– Macro benchmark → measure the performance as a whole – Micro benchmark → measure the performance from a specific aspect, such as, CPU speed, memory access time, I/O speed, OS performance , networking
5
§3.1.1 Micro Benchmarks

Metrics—Performance measures

Intermediately Executed Code is the Key to Find Refactorings that Improve Temporal Data LocalityKristof BeylsElectronics and Information Systems(ELIS),Ghent University,Sint-Pietersnieuwstraat41,B-9000Gent,Belgiumkristof.beyls@elis.UGent.beErik H.D’Hollander Electronics and Information Systems(ELIS)Ghent University,Sint-Pietersnieuwstraat41,B-9000Gent,Belgiumerik.dhollander@elis.UGent.beABSTRACTThe growing speed gap between memory and processor makes an eﬃcient use of the cache ever more important to reach high performance.One of the most important ways to im-prove cache behavior is to increase the data locality.While many cache analysis tools have been developed,most of them only indicate the locations in the code where cache misses occur.Often,optimizing the program,even after pin-pointing the cache bottlenecks in the source code,remains hard with these tools.In this paper,we present two related tools that not only pinpoint the locations of cache misses,but also suggest source code refactorings which improve temporal locality and thereby eliminate the majority of the cache misses.In both tools, the key toﬁnd the appropriate refactorings is an analysis of the code executed between a data use and the next use of the same data,which we call the Intermediately Executed Code (IEC).Theﬁrst tool,the Reuse Distance VISualizer(RD-VIS),performs a clustering on the IECs,which reduces the amount of work toﬁnd required refactorings.The second tool,SLO(short for“Suggestions for Locality Optimiza-tions”),suggests a number of refactorings by analyzing the call graph and loop structure of the ing these tools, we have pinpointed the most important optimizations for a number of SPEC2000programs,resulting in an average speedup of2.3on a number of diﬀerent platforms. Categories and Subject DescriptorsD.3.4[Programming Languages]:Processors—Compil-ers,Debuggers,Optimization; D.2.8[Software Engineer-ing]:Metrics—Performance measuresGeneral TermsPerformance,Measurement,LanguagesPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on theﬁrst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.CF’06,May3–5,2006,Ischia,Italy.Copyright2006ACM1-59593-302-6/06/0005...$5.00.KeywordsTemporal Data Locality,Program Analysis,Refactoring, Program Optimizations,Performance Debugger,Loop Trans-formations1.INTRODUCTIONThe widening speed gap between processor and main mem-ory makes low cache miss rates ever more important.The major classes of cache misses are conﬂict and capacity misses. While conﬂict misses are caused by conﬂicts in the internal cache structure,capacity misses are caused by poor tempo-ral or spatial locality.In this paper,we propose two tools that help to identify the underlying reason of poor temporal data locality in the source code.1.1Related WorkIn recent years,compiler methods have been devised to automatically increase spatial data locality,by transform-ing the data layout of arrays and structures,so that data accessed close together in time also lays close together in the address space[9,11,17,19,22,33].On the other hand, temporal locality can only be improved by reordering the memory accesses so that the same addresses are accessed closer together.Advanced compiler methods to do this all target speciﬁc code patterns such as aﬃne array expressions in regular loop nests[11,18,22],or speciﬁc sparse matrix computations[14,15,24,27].For more general program constructs,fully-automatic optimization seems to be very hard,mainly due to the diﬃculty of the required dependence analysis.Therefore,cache and data locality analysis tools and visualizers are needed to help programmers to refactor their programs for improved temporal locality.void ex(double*X,double*Y,int len,int N){int i,j,k;for(i=0;i<N;i++){for(j=1;j<len;j++)Y[j]=Y[j]*X[i];//39%of cache misses for(k=1;k<len;k+=2)Y[k]=(Y[k]+Y[k-1])/2.0;//61%of cache misses }}Figure1:First motivating example,view on cache misses given by traditional tools.N=10,len=100001.(c) ACM, 2006. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Computing Frontiers, May 2006. /10.1145/1128022.1128071(a)view of reference pairs with long-distance reuse inRDVIS.(b)histogram of long-distance reuses.Gray scales cor-respond to the arrows in(a).(c)graphical view of intermediately executed code in RDVIS,and associated clusteranalysis.(d)view of Intermediately Executed Code of resp.the light and the dark gray cluster in (c).Figure 2:First motivating example,views produced by RDVIS.The colors were manually changed to gray scale,to make the results readable in this black-and-white copy of the paper.Most existing cache and data locality analyzers measure the locality or the cache misses and indicate at which lo-cations in the source code,or for which data structures,most cache misses occur [2,4,5,8,12,13,20,21,23,28,29,32].While this information is helpful in identifying the main bottlenecks in the program,it can still be diﬃcult to deduce a suitable program transformation from it.In this regard,a few of these tools provide additional support for ﬁnding the underlying cause of conﬂict misses (e.g.CVT[28],CacheVIZ[32],YACO[25])or the underlying cause of poor spatial locality (e.g.SIP[4]).In contrast,we present a method to help identify the un-derlying causes of poor temporal data locality.Basically,poor temporal locality results when a large amount of other data is accessed between two consecutive uses of the same data.Improving the locality requires diminishing the vol-ume of data accessed between use and reuse.The source code executed between use and reuse is responsible for ac-cessing the large data volume,resulting in a long reuse dis-tance .That source code is called the Intermediately Exe-cuted Code (IEC)of that reuse.Consequently,to improve the temporal data locality,a refactoring of the IEC is re-quired.In this paper,we present two tools that analyze the IEC in diﬀerent ways to pinpoint the required refactorings:RD-VIS (Reuse Distance VISualizer),which has been discussed earlier in [7],and SLO (Suggestions for Locality Optimiza-tions).RDVIS represents the IEC as a set of basic blocks ex-ecuted between long-distance reuses.In a typical program,there are a huge number of data reuses,and consequently a huge number of corresponding IECs.RDVIS applies a cluster analysis to the IECs so that the main patterns of poor locality-generating source code are revealed.Based on a visual representation of the resulting clusters and high-lighting of the corresponding source code,the programmer can deduce the necessary program optimizations.In SLO,the loop structure and the call graph of the IEC is also taken into account,allowing it to go one step further than RDVIS.SLO pinpoints the exact source code refactorings that are needed to improve locality.Examples of such refac-torings are loop tiling,and computation fusion,which are demonstrated in the motivating examples in section 2.In section 3,reuse distances and associated terms are deﬁned.Section 4describes how RDVIS analyzes the IEC.Section 5presents the analyses performed by SLO on the IEC to ﬁnd the appropriate source code refactorings.In section 6,we provide a few case studies where these tools have been used to identify the required refactorings for a number of real-world programs from the SPEC2000benchmarks.For two of them,we applied the necessary transformations,leading to an average cross-platform speedup of about 2.3.Con-cluding remarks are given in section 7.2.MOTIV ATING EXAMPLESWe start by showing two small code examples where the indication of cache misses with traditional tools does not clearly reveal how to optimize the programs.Furthermore,we show how RDVIS and SLO visualize the intermediately executed code of long-distance reuses,and how that makes it easier to ﬁnd source code refactorings that improve temporal locality.2.1Example 1:Intra-Procedural Loop ReusesThe code in ﬁgure 1shows a small piece of code,where a traditional tool would show that the ﬁrst statement is responsible for about 39%of all cache misses and the second statement produces 61%of them.While this information indicates where cache misses occur,it is not directly clear how the locality of the program can be improved to diminish the number of cache misses.The views produced by our tools are shown in ﬁgure 2for RDVIS,and in ﬁgure 3for SLO.For each pair of refer-ences that generate many long-distance reuses,an arrow is drawn,starting at the reference that accesses the data ﬁrst,and pointing to the reference that reuses that data after a long time.Figure 2(a)shows the four pairs of references that generate the majority of long-distance reuses:(Y[k],Y[j]),(Y[k-1],Y[j]),(Y[j],Y[k])and (Y[j],Y[k-1]).Figure 2(b)shows that each of those 4pairs generate about the same amount of long-distance reuses at distance 217,meaning that about 217other elements are accessed between those reuses.(In this example,N =10and len =100001).When the cache can contain 210elements (as indicated by the background),all the reuses at a larger distance lead to cache misses.So,the reuses at distance 217must be made smaller than 210,in other words,largely diminishing the amount of data accessed between use and reuse.To optimize each of the four arrows in ﬁgure 2(a),the ﬁrst step is to pinpoint which code is responsible for generating the accesses between use and reuse.The second step is to refactor the code so that fewer data elements are accessed between use and reuse.RDVIS records the basic blocks executed between each use and reuse,and allows to visu-ally indicate the corresponding source code for each arrow.Besides an evaluation examining each arrow separately,RD-VIS also contains a cluster analysis.The arrows with similar IEC are put in the same cluster.As an example,ﬁgure 2(c)shows how RDVIS graphically represents the result of the cluster analysis.On the left hand side,the code executed(a)5diﬀerent optimizations indicated by gray scale,with respect to the reuse distance of the reuses they optimize,as shown by SLO(b)Indication of the two optimizations for the reuses at distance 217,as indicated by SLO.The light gray op-timization indicates fusion of the two inner loops.The dark gray optimization requires tiling the outer loop.Figure 3:First motivating example,SLO view.The colors were manually altered to gray scale,to make the results readable in this black-and-white copy of the paper.between use and reuse is graphically represented.There are four horizontal bands,respectively representing the IEC of the four arrows in ﬁgure 2(a).In each band,the basic blocks in the program are represented,left to right.If a basic block is executed between use and reuse,it is colored in a shade of gray,otherwise it is nearly white.Originally,RDVIS pro-duces a colored view with higher contrast.Here,the colors were converted to enhance readability in black-and-white.Figure 2(c)shows that the code executed between use and reuse of arrows 1and 2are identical.Also the code exe-Figure4:Code and left-over long reuse distance after loop fusion.cuted between use and reuse of arrow3and4are identical. On the right hand side,the cluster dendrogram graphically indicates how“similar”the IEC is for each arrow.In this example,the user has manually selected two subclusters.It shows that52.6%of the long distance reuses are generated by the light gray cluster,while47.4%are generated by the dark gray cluster.Furthermore,inﬁgure2(d),the IEC for the two clusters has been highlighted in the source code by RDVIS.The code that is executed between use and reuse is highlighted in bold.This shows that for the light gray cluster,the uses occur in the j-loop,while the reuses occur in the k-loop.Both the use and the reuse occur in the same iteration of the i-loop,since the loop control code:i<N; i++is not highlighted.These two arrows can be optimized by loop fusion,as is discussed in detail below.In the dark gray cluster,it shows that the control of loop i:i<N;i++ is executed between use and reuse.Hence the use and reuse occur in diﬀerent iterations of the outer i-loop.The expe-rienced RDVIS-user recognizes from this pattern that loop tiling needs to be applied,as discussed in more detail below. In contrast to RDVIS,where the programmer needs to examine the Intermediately Executed Code to pinpoint op-timizations,SLO analyzes the IEC itself,and interactively indicates the optimizations that are needed.For example, inﬁgure3(b),the required loop fusion and loop tiling are indicated by a bar on the left hand side.Furthermore,the histogram produced by SLO indicates which reuses can be optimized by which optimization in diﬀerent colors,e.g.see ﬁgure3(a).The upper histogram shows the absolute num-ber of reuses at a given distance.The bottom histogram Figure5:Code and left-over long reuse distance after loop tiling.shows the fraction of reuses at a given distance that can be optimized by each transformation.Below,we explain how loop fusion and loop tiling can be used to improve the locality and performance.These two transformations are the most important optimizations for improving temporal locality in loops.2.1.1Optimizing Pattern1:Loop FusionFrom both the views produced by RDVIS(ﬁg.2(d)at the top)and SLO(ﬁg.3(b)at the top),it shows that about half of the long-distance reuses occur because element Y[j] is used in theﬁrst loop,and it is reused by references Y[k] and Y[k-1]in the second loop.The distance is long because between the reuses,all other elements of array Y are accessed by the same loops.For this pattern,the reuse distance can be reduced by loop fusion:instead of running over array Y twice,the computations from both loops are performed in one run over the array.In order to fuse the loops,theﬁrst loop is unrolled twice,after which they are fused,under the assumption that variable len is odd,resulting in the code in ﬁgure4.The histogram in theﬁgure shows that the long-distance reuses targeted have all been shortened to distances smaller than25.This results in a speedup of about1.9ona Pentium4system,due to fewer cache misses,see table1.2.1.2Optimizing Pattern2:Loop TilingAfter fusing the inner loops,the code can be analyzed again for the causes of the remaining long reuse distance patterns.Figure4shows how SLO indicates that all left-version exec.time speeduporig0.183sfused0.098s 1.87fused+tiled0.032s 5.72Table1:Running times and speedups of the code before and after optimizations,on a2.66Ghz Pen-tium4,for N=10,len=1000001.1double inproduct(double*X,double*Y,int len){ int i;double result=0.0;for(i=0;i<len;i++)result+=X[i]*Y[i];//50%of cache misses 5return result;}double sum(double*X,int len){int i;double result=0.0;10for(i=0;i<len;i++)result+=X[i];//50%of cache missesreturn result;}15double prodsum(double*X,double*Y,int len){ double inp=inproduct(X,Y,len);double sumX=sum(X,len);double sumY=sum(Y,len);return inp+sumX+sumY;20}Figure6:View on cache misses as provided by most traditional tools for the second example.over long reuse distances occur because the use is in one iteration of the i-loop,and the reuse is in a later iteration. Consequently,the tool indicates that the i-loop should be tiled,by displaying a bar to the left of the loop source code. Loop tiling is applied when the long-distance reuses occur between diﬀerent iterations of a single outer loop.When this occurs,it means that in a single iteration of that loop,more data is accessed than canﬁt in the cache.The principle idea behind loop tiling is to process less data in one iteration of the loop,so that data can be retained in the cache between several iterations of the loop.Figure5shows the code after tiling.Now,the inner j-loop executes at most50iterations (see variable tilesize),and hence the amount of data ac-cessed in the inner loop is limited.As a result,the reuses between diﬀerent iterations of the i-loop are shortened from a distance of217to a distance between27and29,see the histograms in Figures4and5.Note that some reuses have increased in size:1in50reuses between iterations of the j-loop inﬁgure4have increased from24–25to29–210(see dark bars inﬁgure5).This is because1in50reuses in the original j-loop are now between iterations of the outer jj-loop.The end result is the removal of all long-distance reuses.As a result,the overall measured program speedup is5.7,see table1.2.2Example2:Inter-Procedural ReusesThe second example is shown inﬁgure6.The code in function prodsumﬁrst calculates the inproduct of two arrays by calling inproduct,after which the sum of all elements in both arrays is computed by calling function sum.Most existing tools would show,in one way or another,that halfof the misses occur on line4,and the other half are causedby the code on line11.In contrast,RDVIS shows two reference pairs,indicatedby arrows,that lead to long distance reuses,seeﬁgure7.By examining the highlighted code carefully,the program-mer canﬁnd that uses occur in the call to inproduct,while reuses occur in one of the two calls to sum.Here,the pro-grammer must perform an interprocedural analysis of the IEC.SLO,on the other hand,performs the interprocedu-ral analysis for the programmer,and visualizes the result as shown inﬁgure8.It clearly identiﬁes that for half of the long-distances reuses,inproduct must be fused with theﬁrst call to sum,and for the other half inproduct must be fused with the second call to sum.3.BASIC DEFINITIONSIn this section,we review the basic terms and deﬁnitions that are used to characterize reuses in a program.Deﬁnition1.A memory access a x is a single access to memory,that accesses address x.A memory reference r is the source code construct that leads to a memory instructionat compile-time,which in turn generates memory accessesat run-time.The reference that generates memory access a xis denoted by ref(a x).The address accessed be a memory access is denoted by addr(a x),i.e.addr(a x)=x.Deﬁnition2.A memory access trace T is a sequence of memory accesses,indexed by a logical time.The diﬀerencein time between consecutive accesses in a trace is1.The time of an access a x is denoted by T[a x].Deﬁnition3.A reuse pair a x,a x is a pair of memory accesses in a trace such that both accesses address the same data,and there are no intervening accesses to that data. The use of a reuse pair is theﬁrst access in the pair;the reuse is the second access.A reference pair(r1,r2)is a pair of memory references. The reuse pairs associated with a reference pair(r1,r2)is the set of reuse pairs for which the use is generated by r1and the reuse is generated by r2,and is denoted by reuses(r1,r2).Deﬁnition4.The Intermediately Executed Code(IEC) of a reuse pair a x,a x is the code executed between T[a x] and T[a x].Deﬁnition5.The reuse distance of a reuse pair froma trace,is the number of unique memory addresses in that trace between use and reuse.Cache misses are identiﬁed by the reuses that have a dis-tance larger than the cache size[6].4.RDVIS:IEC ANALYSIS BY BASIC BLOCKVECTOR CLUSTERINGIn RDVIS,the Intermediately Executed Code is repre-sented by a basic block vector:Deﬁnition6.The basic block vector of a reuse paira x,a x ,denoted by BBV( a x,a x )is a vector∈{0,1}n, where n is the number of basic blocks in the program.Whena basic block is executed between use and reuse,the corre-sponding vector element is1,otherwise it is0.(a)IEC for ﬁrst referencepair.(b)IEC for second reference pair.Figure 7:Indication of intermediately executed code byRDVIS.(a)Two required fusions of functions indicated byarrows.(b)The reuse distance histogram for the reuses opti-mized by the two arrows in (a),for len =1000000.Figure 8:Indication of locality optimizations by SLO.The basic block vector of a reference pair (r 1,r 2),denoted by BBV ((r 1,r 2))is a vector ∈[0,1]n .The value of a vector element is the fraction of reuse pairs in reuses(r 1,r 2)for which the basic block is executed between use and reuse.More formally:BBV ((r 1,r 2))=Pa x ,a x ∈reuses(r 1,r 2)BBV( a x ,a x )#reuses(r 1,r 2)In RDVIS,reference pairs are visually represented by ar-rows drawn on top of the source code,e.g.ﬁgure 2.The tool allows to highlight the code executed between use and reuse for each individual arrow.Additionally,RDVIS clusters ar-rows according to the similarity of their IEC.The similarity (or rather dissimilarity)of the code exe-cuted between two reference pairs is computed as the Man-hattan distance of the corresponding basic block vectors in the vector space [0,1]n .When exactly the same code is exe-cuted between the reuses,the distance is 0;when the code is completely dissimilar,the distance is n .Based on the Man-hattan distance,an agglomerative clustering is performed,which proceeds as follows.First,each reference pair forms a separate cluster.Then,iteratively,the two closest clustersare merged into a single cluster.The basic block vector cor-responding with the new cluster is the average of the two basic block vectors that represent the merged clusters.The clustering stops when all reference pairs are combined into one large cluster.The distances between diﬀerent subclus-ters are shown graphically in the dendrogram,and the user selects “interesting-looking”or “tight”subclusters. E.g.in ﬁgure 2(c),the user selected two very tight subclusters:the light gray and the dark gray subcluster.Since similar code is executed between use and reuse in a tight subcluster,it is likely that the long-distance reference pairs can be optimized by the same refactoring,e.g.see ﬁgure 2(d).5.SLO:IEC ANALYSIS BY INTERPROCE-DURAL CONTROL FLOW INSPECTIONSLO aims to improve on RDVIS by analyzing the IEC further and automatically pinpoint the refactorings that are necessary to improve temporal locality,even in an interpro-cedural context.To make this possible,SLO tracks the loop headers (i.e.the basic blocks that control whether a loop body is executed [1])and the functions that are executed between use and reuse,using the following framework.5.1Step1:Determining the Least CommonAncestor FunctionFigure9:The Least Common Ancestor Frame (LCAF)of a reuse,indicated in the activation tree. The activation tree represents a given time during the execution of the code inﬁgure6,assuming that the use occurs inside function inproduct,and the reuse occurs inside sum.SLO proceeds byﬁrst determining the function in which the refactoring must be applied.In a second step,the exact refactoring on which part of that function’s code is com-puted.The refactoring must be applied in the“smallest”function in which both the use and the reuse can be seen. This is formalized by the following deﬁnitions,and illus-trated inﬁgure9.Deﬁnition7.The activation tree[1]of a running pro-gram is a tree with a node for every function call at run-time and edges pointing from callers to callees.The use site of a reuse pair a x,a x is the node cor-responding to the function invocation in which access a x occurs.The reuse site is the node where access a x occurs. The Least Common Ancestor Frame(LCAF)of a reuse pair a x,a x is the least common ancestor in the acti-vation tree of the use site and the reuse site of a x,a x .The Least Common Ancestor Function is the function that corresponds to the least common ancestor frame.The LCAF is the function where some refactoring is needed to bring use and reuse closer together.Once the LCAF has been determined,the loop structure of the LCAF is exam-ined,and the basic blocks in the LCAF executed between use and reuse.Deﬁnition8.The basic block in the LCAF,in which the use occurred(directly or indirectly through a function call), is called the Use Basic Block(UseBB)of a x,a x ;the basic block that contains the reuse is called the Reuse Ba-sic Block(ReuseBB)of a x,a x .5.2Step2:Analyzing the Control FlowStructure in the Least Common AncestorFunctionThe key to the analysis isﬁnding the loops that“carry”the reuses.These loops are found by determining the Non-nested Use and Non-nested Reuse Basic Blocks,as deﬁned below(illustrated inﬁgure10):Deﬁnition9.The Nested Loop Forest of a function is a graph,where each node represents a basic block in the function,and there are edges from a loop header to each basic block directly controlled by that loop header.The Outermost Executed Loop Header(OELH)ofa basic block BB with respect to a given reuse pair a x,a x is the unique ancestor of BB in the nested loop forest that has been executed between use a x and reuse a x, but does not have ancestors itself that are executed between use and reuse.The Non-nested Use Basic Block(NNUBB)of a x,a x is the OELH of the use basic block of a x,a x .The Non-nested Reuse Basic Block(NNRBB)of a x,a x is the OELH of the reuse basic block of a x,a x .5.3Step3:Determining the RequiredRefactoringRefactorings are determined by analyzing the NNUBB and NNRBB.We subdivide in3diﬀerent patterns:Pattern1:Reuse occurs between iterations of a single loop.This occurs when NNUBB=NNRBB,and they are loop headers.Consequently,a single loop carries the reuses. This pattern arises when the loop traverses a“data struc-ture”1in every iteration of the loop.The distance of reuses across iterations can be made smaller by ensuring that onlya small part of the data structure is traversed in any given iteration.As such,reuses of data elements between consecu-tive iterations are separated by only a small amount of data, instead of the complete data structure.A number of transformations have been proposed to in-crease temporal locality in this way,e.g.loop tiling[26,30], data shackling[18],time skewing[31],loop chunking[3], data tiling[16]and sparse tiling[27].We call these transfor-mations tiling-like optimizations.An extreme case of sucha tiling-like optimization is loop permutation[22],where in-ner and outer loops are swapped,so that the long-distance accesses in diﬀerent iterations of the outer loop become short-distance accesses between iterations of the inner loop. Examples of occurrences of this pattern are indicated by bars with the word“TILE L...”inﬁgures3,4and5. Pattern2:Use is in one loop nest,the reuse in an-other.When NNUBB and NNRBB are diﬀerent loop head-ers,reuses occur between diﬀerent loops.The code tra-verses a data structure in the loop indicated by the NNUBB. The data structure is retraversed in the NNRBB-loop.The reuses can be brought closer together by only doing a sin-gle traversal,performing computations from both loops at the same time.This kind of optimization is known as loop fusion.We call the required transformation a fusion-like optimization.Examples of this pattern are indicated by bars with the word“FUSE L...”inﬁgure3.Pattern3:NNUBB and NNRBB are not both loop head-ers.When one of NNUBB or NNRBB are not loop head-ers,it means that either the use or the reuse is not insidea loop in the LCAF.It indicates that data is accessed in one basic block(possibly indirectly through a function call), and the other access may or may not be in a loop.So, the reused data structure is traversed twice by two separate code pieces.In this case,bringing use and reuse closer to-gether requires that the computations done in the NNUBB and in the NNRBB are“fused”so that the data structure is1the data structure could be as small a single scalar variableor as large as all the data in the program。

Performance Evaluation and Metrics

Copyright by Jerry Gao
Performance Evaluation - Approaches
Performance testing: (during production) o measure and analyze the system performance based on performance test data and results
Copyright by Jerry Gao
Performance Evaluation
What is performance evaluation? Using a well-defined approach to study, analyze, and measure the performance of a given system. The basic tasks and scope:
Copyright by Jerry Gao
Performance Test - Tools
Performance test tools can be classified into: Simulators and data generators: o Message-based or table-based simulators o State-based simulators o Model-based data generators, such as o Pattern-based data generators o Random data generators Performance data collectors and tracking tools o Performance tracking tools Performance evaluation and analysis tool o Performance metric computation o Model-based performance evaluation tool Performance monitors o For example, sniffer, Microsoft performance monitor o External third-party tools Performance report generators

Data-Driven Supplier Performance Monitoring

Data-Driven Supplier Performance MonitoringConstantinos BoussiosOpen Ratings, Inc., andLaboratory for Information and Decision SystemsMassachusetts Institute of Technologyboussios@Giorgos ZachariaOpen Ratings, Inc., andCenter for Computational and Biological LearningArtificial Intelligence LaboratoryMassachusetts Institute of Technologyzacharia@Theodoros EvgeniouTechnology ManagementINSEADevgeniou.theodoros@insead.frOlga SimekOpen Ratings, Inc. andMechanical EngineeringMassachusetts Institute of Technologyosimek@* Proofs and Reprints should be sent to:Giorgos Zacharia928 Commonwealth Ave, Boston, MA 02115, USAtel: +1-617-232-9660, fax: +1-617-232-9670zacharia@AbstractWe study how supplier performance as well as transactional and financial data about companies can be used to derive a number of findings useful for supplier selection, monitoring the supply chain performance, and improving inefficiencies. We discuss findings from analyzing supplier performance data for a small sub-set of a database with information on about 15 million companies, which suggest that the combination of the increased ability to capture transactional data among companies globally and the use of data mining tools can have an important impact on supply chain management.Keywords: supplier performance monitoring, machine learningAdvances in information technology now make possible the storage of an increased amount of information capturing traces of the transactions among companies and their suppliers and customers. It is now possible not only to collect information about a company’s individual transactions with its suppliers, but also to create a centralized information warehouse with transactions among a large number of companies. This information can then be analyzed to predictively assess the performance of suppliers, effectively helping to improve the management of supply chains.We study how supplier performance as well as transactional and financial data about companies can be used to derive a number of findings useful for supplier selection, monitoring the supply chain performance, and improving inefficiencies. We discuss findings from analyzing supplier performance data for a small sub-set of a database with information on about 15 million companies, which suggest that the combination of theincreased ability to capture transactional data among companies globally and the use of data mining tools can have an important impact on supply chain management.With the increase of outsourcing activities and the ease of finding suppliers at any location, it becomes more and more important to have reliable predictive supplier performance measures that are used for sourcing decisions. For example, Just-In-Time Manufacturing (Burton, 1988; Chapman and Carter, 1990) where delivery delays or inaccurate shipments make the whole scheme fall apart and six-sigma production quality where supplier product quality must be superb make such performance data critical. Any kind of optimal supply chain management needs to assume certain supplier performance standards. However, such historical standards are likely to change any time. Large corporations try to monitor their critical suppliers - those for whom quality of their product and ability to deliver is critical - very closely. However, there are cases where they still fail, such as the well-publicized Firestone-Ford relationship. Moreover, there is a large piece of the supply base that remains unmeasured, untapped into, and for which performance monitoring can lead to large savings in total cost of ownership (Ellram, 1995).More and more companies are now recording a number of statistics about their transactions with their suppliers, such as information from e-purchasing orders and invoices. In addition, companies also record statistics about performance in order to keep track of their own logistics. This information can be aggregated from all these companies in an information warehouse of an “infomediary”. This paper shows that significant value can be added by analyzing the accumulated data and then redistributing the“global” findings to the companies, which can then improve their supply performance based on this additional information (for example by predicting future outcomes, critical for decisions along the supply chain). We study an example of such an electronic supply chain performance monitoring “infomediary”, and we show how data mining can be used to extract useful information from the data accumulated across companies. The data is provided by Open Ratings, Inc., a supplier performance solutions provider. Open Ratings (ORI), along with Dun and Bradstreet (D&B), has created Buyer Insight, a database with 15 million rated suppliers based on qualitative purchasing manager surveys, financial and operational data, and other external sources of data. The work presented in this paper was done on data that belong to customers or data partners of ORI, and given that data sources are covered by confidentiality agreements, the analyzed companies will be anonymous throughout the paper.The area of supplier performance evaluation has received both research attention (Browning et al. 1983; Burt 1984; Dickson 1966; Kraljic 1983; Treleven 1987) and industry attention through supplier management programs developed by purchasing managers. Neither the idea of measuring suppliers through a variety of qualitative and quantitative metrics, nor the application of machine learning techniques for the modeling of supplier performance are new (Hinkle et al, 1969; Petroni and Braglia, 2000; Siying et al, 1997).The contribution of this paper is that we combine data from several companies and other external sources, like financial, operational, and survey data, and create machine learning models that predict supplier performances with high accuracy. While individualcompanies cannot afford to monitor exhaustively all their significant suppliers, we can create these predictive models by aggregating sparse information from a variety of companies and content sources.Large datasets stored in information warehouses, like the ones we study in this paper, require more and more advanced methods for analysis. Data is typically noisy, so robust methods for data mining are needed. Moreover, there is often a large number of attributes (high dimensional data), like in some of the cases we discuss here, so the methods used need to be able to handle such high dimensional data. Data mining, with the use of recently developed advanced methods from machine learning (Vapnik, 1998), provides a promising framework for analyzing the stored data and developing models that are useful. Data mining has been used in the past as a management tool (see for example (Cooper and Giuffrida, 2000) and references therein), mostly focusing on marketing (Neelamegham and Chintagunta 1999; Berry and Linoff, 2000; Cooper et al, 1999). In this paper we focus on supply chain, particularly on procurement.The paper is organized as follows. We first describe the data accumulated across companies that we used for this study. Section 2 presents some statistical findings from this data that are then taken into account when developing predictive data mining systems in section 3. In Section 3 we discuss how to forecast valuable performance metrics at the level of individual transactions to be used for example for future sourcing decisions. We also describe the tools we used for this purpose. We present a field case study that illustrates the apparent implied benefits for Supply Chain Management. Finally, Section 4 is summary and conclusions.1 Data for the studyThe data used in this study comes from a number of sources. Company surveys provide ratings of the buyer companies on questions about issues such as the buyer’s satisfaction with the product/service quality, the timeliness of the delivery, the customer support (ratings on a scale from 0 to 100), etc1. The Summary Rating predicts the overall customer satisfaction. It is a statistical combination (also using machine learning models) of the detailed aspect ratings along with qualitative customer ratings for their overall satisfaction with the supplier, and other data sources. At the same time, quantitative transaction compliance data have been extracted from purchase orders, receipts, and invoice archives from very large ERP systems. This data includes information such as lead times, transaction compliance in terms of delivery, timeliness, and quantity accuracy, and quality attributes (i.e. part per million failures, number of shipments with defective product, etc). Finally financial and operational data for over 15 million corporations has been provided by Buyer Insight. This data includes information like area of business (SIC code), scope of business (i.e. annual sales, number of employees, etc), financial soundness (including a measure of bankruptcy risk, lien and suit cases) and socioeconomic information (small business, women or minority owned corporation, etc). The Financial Stress Score (FSS) is a long-term index that measures the likelihood of the firm failing over the next 18 months. A failure can be either bankruptcy, or any other interruption of operations with loss to creditors. The FSS ranges from 1, for a stable1 The actual questionnaire can be found at/solutions/spr_ppe/spr_sample_report.pdfsupplier, to 5 for the most financially stressed ones. If a company has already failed financially, then its FSS is 0 (by convention). As expected, for most companies the FSS is 1 (only a small percentage of the companies are “financially stressed”).It turns out that such data contain valuable information on several key aspects of supplier performance and supply base management. We provide evidence of such information by looking into a small sample of representative examples.2 Statistical Analysis of the DataWe discuss some statistical findings about correlations between the survey, financial, and transactional data. These findings provide insights for the predictive tools we discuss in the following section. Because of many missing data points, we study each hypothesis using a subset of the overall data for which the necessary information is available.2.1 Correlation between Survey Responses and Financial DataWe study the correlations between the Summary Ratings (which as we mentioned above, measures the overall customer satisfaction for the specific supplier) and a sample of financial attributes for the companies on which surveys were conducted. We divide the surveyed companies into groups that correspond to different sectors of the economy, in order to account for special characteristics that pertain to the different ways that businessis conducted in each sector. We use the top-level division of companies in terms of their area of business using SIC codes, and we study here the following groups:1) Group D Manufacturing (4492 companies with surveys)2) Group F Wholesale Trade (1804 companies with surveys)3) Group I Services (5495 companies with surveys)In each of the following correlations, we cluster the suppliers in four groups based on their Summary Ratings. Each cluster includes a quartile (25%) of the corresponding group of suppliers, in terms of increasing ratings: the first cluster includes the 25% suppliers with the highest rating, and the second cluster consists of the 25% suppliers of highest rating excluding those that belong to the first cluster, and so on. In the rest of this section, the term quartile refers to the sub-division of companies within each line-of-business group based on their Summary Ratings.a) Percentage of companies in each quartile and High Financial Stress ScoreWe first examine the (natural) hypothesis that the lower the rating of a company, the higher the probability of high FSS (high chance of financial failure) for that company. We perform this test also as a measure of the quality of the survey ratings.Figure 1, shows the percentage of companies in each quartile with FSS other than 1 (if FS={2,3,4,5} then the company is financially stressed, and if FSS=0, then the company is bankrupt or has ceased operations without paying creditors), for companies that belong to Group I (Services):As we would expect, Figure 1 shows that the better the performance of a company as a supplier in its corresponding area of business, the lower the probability of it being in a state of financial risk (r-square=0.9328).b) Percentage of companies in each quartile based on annual salesWe now examine the relation between the Summary Rating and the annual sales of a company. In a similar way to the definition of rating quartiles, we define as the High Annual Sales Subgroup the 25% companies of highest annual sales. We define the Low Annual Sales Subgroup similarly. We compute the percentage of companies in group F in each rating quartile that belong to the High Annual Sales Subgroup, and show the results in Figure 2. There is clearly a very strong correlation (r-square=0.9908) between the survey ratings of the suppliers and the annual sales of the companies within the same SIC code and the highest annual sales.We also compute the percentage of companies in each rating quartile that belong to the Low Annual Sales Subgroup for group D, and show the results in Figure 3.The graphs show that the worse the performance of a company as a supplier in its corresponding area of business, the higher the probability (r-square=0.9928) of it being a high annual sales company (and vice versa). This result indicates that buyers in the wholesale trade and in manufacturing marketplaces tend to be kept less satisfied by the performance of suppliers with a higher overall sales volumes.c) Percentage of small businesses in each quartileWe consider the companies of Group I and plot the relation between survey ratings and company size. In particular, we compute the percentage of companies in each rating quartile that are characterized as Small Businesses based on their number of employees (less than 500 employees).We also calculated information gain ratios (Quinlan 1986, Witten & Frank 2000) for our financial data attributes. In case of the decision trees, which is the algorithm we used for supplier performance predictions, the information gain measure relates to the amount of information obtained by making a decision. Its calculation makes use of the entropy function, which is defined as entropy(p1,p2,…,p n)= -p1 log p1 –p2 log p2…-p n log p n , where p1, …, p n are positive real numbers usually representing conditional probabilities. The value of the information gain function is artificially high when the attribute used to make the decision has a large number of possible values. To compensate for this, a modification of the measure called the gain ratio is widely used. To obtain the gain ratio, the information gain function is divided by the intrinsic information of the attribute. Table 1 contains the information gain ratios for attributes employees, small business indicator, sales and financial stress indicator. These attributes were some of the high ranking among our financial data attributes used.In conclusion,the above plots as well as information gain ratio calculations suggest strongly that a certain amount of information about the performance of a supplier in its corresponding industry grouping is contained in his financial data. The fact that suchcorrelations exist in multiple dimensions of financial data suggest that we should be able to train machine learning models that offer highly accurate predictions about the performance of a supplier (in particular, predict the survey rating the supplier would achieve in a survey of its performance or other quantitative performance metrics).2.2 Correlations between Transactional and Financial DataIn this section, we study the relation between financial data and actual recorded transactional compliance data (in particular, delay of delivery) of the suppliers of a certain corporate buyer. The following plots are based on delivery data from a total number of 820 suppliers. The suppliers are mainly in manufacturing.2.2.1 Sales per employee and delay of deliveryWe use the sales per employee ratio as a financial performance indicator. Figure 5 shows how the sales-per-employee ratio is related to the average delay of delivery for these suppliers (r-square=0.7724). The graph implies a pattern whereby suppliers with high sales/employee ratio have better delivery timeliness performance.To examine the significance of this relation, we create an additional graph (Figure 6), where the suppliers are divided in groups according to their annual sales per employee, and the percentage of group suppliers with average delay of delivery less than 20 days is computed for each group and plotted.The graph indicates the existence of a sales/employee value where suppliers (in the industry sector where they are drawn from) perform optimally. Moreover, this optimal point is neither at the low nor at the high end of the sales/employee value range. Indeed, low sales/employee ratios are expected of poor performers, as a consequence of poor performance. At the same time, high values of sales/employees indicate over-capacity operating conditions, which are expected to influence performance negatively. If we fit our points with a 2nd degree polynomial the r-square is 0.8531. Given the small degree of the polynomial and the relatively high value of r-square, we expect this relationship to generalize well.2.2.2 Financial Stress and delay of deliveryThe financial stress indicator was available for only 397 of the suppliers of the companies we study here. As we mentioned, its value for the majority of the suppliers (370) is 1. We consider the few suppliers with financial stress indicator higher than 1 (indicates long-term financial performance shortcomings) and examine where they stand in terms of delivery performance.Table 2 and Figure 7 show the distribution of suppliers with financial stress over 1 in terms of their delivery timeliness performance.According to Figure 7 and Table 2, there is higher density of "poor" suppliers among those who have worse delivery performance as compared to among those with better performance. Although the r-square of the data points of Figure 7 is small (0.1570), thereis still a very strong pattern, indicating that suppliers who have average delays larger 26 days, are twice as likely to be financially stressed.Figures 5-7 show the relation between financial performance and delivery timeliness performance. According to this data, such financial attributes are of value to the procurement department of the buyer. Moreover, based on this data, these attributes can be used (in combination with other attributes) for supplier performance prediction. However, the usefulness of the financial stress indicator is limited, because its value is 1 for most companies. This restricts the ability to differentiate between companies based on it.As in the previous section, we calculate information gain ratios for some of our attributes; the results are summarized inTable 3. The attribute values are per supplier and the gain ratios are calculated with respect to the average delay of a supplier. The attributes listed are some of the high ranking ones used for average delay predictions.2.3 Information in Transactional DataWe now present two examples of what type of information is “hidden” within the transactional data themselves. We evaluate the hypothesis that delivery delays are higher in the case of expensive purchases. The first example reveals the correlation of per-supplier average sale price and corresponding delivery delay, whereas the second example shows transaction specific correlation of price and delay.2.3.1 Dependence of supplier average delivery delay on average sale value ofsupplier's salesWe divide all suppliers into 10 groups. The first group consists of the 10% (82) suppliers with highest average sale value; the second group consists of the next 10% of suppliers in terms of average sale value; and so on. For each group, we compare the average delay of each supplier of the group to 10.3 days (the median delay among all 820 suppliers).Figure 8 shows the percentage of group suppliers with average delay greater than 10.3 days.By labeling (for convenience of presentation) a delay as "late" if it is over 10.3 days and "early" otherwise, Figure 8 shows that the probability that a supplier delivers "late" on the average increases as the supplier's average value of sale increases. Moreover, Figure 8 indicates that the average value of sale can be a predictor of average delay. An example of a predictive rule derived fromTable 3: if a supplier belongs to the top 10% in terms of average sale value, the average delay is over 10.3 days with an 82.2% probability. Similar rules (most with lower success rates) can be derived from the other entries of the figure. The average sale value can, therefore, be used for a delivery performance predictor.2.3.2 Dependence of purchase specific delivery delay of a purchase on value of thepurchase priceIt is observed that a supplier tends to deliver orders of higher value slower than orders of lower value. To confirm this, we perform the following computation:For each supplier with 10 deliveries or higher, we compare the delay of each one of its 10% most expensive deliveries to the median delay of delivery for that supplier. We then sum over all suppliers with 10 or more deliveries. The process is repeated for the 30% and 50% most expensive deliveries. Figure 9 shows that over 60% of the most expensive sales are delivered with delay larger than the typical delivery delay of the supplier who makes the particular sale. In fact for 2 out of 3 of the suppliers with more than 20 sales each, over 60% of the top 10% (in value) sales of each were delivered with a delay higher than the supplier's median delivery delay.Therefore, each supplier tends to deliver higher value sales with longer delays than lower value sales.Based on this observation, it follows that a proper normalization for supplier ratings should also be product (or product range) specific. Since buyers use ratings for comparisons between suppliers, performance ratings probably need to vary depending on the specific item of the supplier's catalogue that is sought for by the buyer in a specific sale.Finally, motivated by these relations, we suggest that the delivery delay of a specific sale can be predicted from the value of the particular sale, in combination with the averagesale value and average delivery delay of the selling supplier. Such a prediction would provide the buyer with a purchase decision support as well as resource planning support.2.3.3 Correlation of transaction compliance metrics with performance ratingsWe use the recorded quality compliance data of the supply base of a company in the Wholesale Trade area of business. The study covers the recorded parts-per-million failures of 94.2% of the company’s suppliers over a period of 1 year. The supply base includes a total 3032 suppliers. Over the period under consideration, quality failures were detected in shipments from a total 626 suppliers.To keep the presentation simple, we divide the suppliers in 2 groups by means of their Summary Rating value:Group L: the 30% suppliers of Lowest Summary RatingGroup H: the remaining 70% suppliersIn other words, each supplier in Group H has higher Summary Rating than each supplier in Group L, and the groups split the supply base to a 30/70 (L/H) division.We compute the percentage of suppliers with at least one recorded failure that belongs to Group L (has Low Summary Rating). We repeat the same for suppliers with no recorded failures. The results are summarized in Table 4.This table shows that is almost twice as likely to encounter a supplier with Low Summary Rating among suppliers with quality failures than among those without any recorded failures.Another interesting view about the implications of this correlation results from reversing the table and presenting:a. The percentage of Group L Suppliers with recorded failures, againstb. The percentage of Group H Suppliers with recorded failuresThe results are shown in Table 5.So suppliers with high Summary Ratings, create less than half quality failures compared to suppliers with low performance ratings.3 Predictive Models for Sourcing Decision MakingCompanies increasingly focus on optimizing their bottom lines. Therefore purchasing managers put more effort in rationalizing their supply chains and reducing direct and indirect material costs through strategic sourcing. Material costs can raise from a variety of supplier performance related issues, like cost to expedite late orders, replace defective material, cost of management of suppliers that need close monitoring to perform well, and cost of quality related issues. Predictive supplier analytics of the monitored supplier performance metrics can allow purchasing managers to account for these problems ahead of time, manage the underperforming suppliers through supplier development programs,renegotiate the financial aspects of their relationship, or even totally drop them if they cannot meet their performance and cost criteria.In addition, some manufacturing companies have observed that their own purchasing departments are often responsible for the decline of their suppliers’ performance. When purchasing professionals observe that some suppliers perform well, they tend to push to those suppliers more orders in terms of quantities and variety, causing increasing performance problems for those suppliers. Therefore, predictive transaction specific analytics can warn purchasing managers of the potential problems before they start demanding the suppliers to perform beyond their capacities.Finally, many large purchasing organizations lose money by issuing purchase orders to suppliers that end up ceasing operations before delivering their outstanding orders – especially often during economic slowdowns. Therefore, predictive supplier solvency analytics can warn buyers ahead of time, so that they recognize the financially risky suppliers. Although, we do not describe it in this paper, we were able to build machine learning models that predict with an accuracy higher than 95% which companies are likely to go out of business in the next 30 days. Again, for these models we used a variety of aggregated data, including transaction compliance data, and buyer surveys (supplier performance problems are a leading indicator of financial problems) and historic financial and operational data.We discuss here an example of the type of predictions for supporting sourcing decisions (i.e. choosing suppliers) that is motivated from the findings discussed so far. First, we briefly summarize the tools we used to develop these predictors.3.1 Data Mining ToolsTo develop predictive models about the performance of suppliers in the future, we used the following machine learning tools: Support Vector Machines, and AdaBoost or Bagging Decision trees. We briefly discuss these methods and refer the reader to the references for further reading.3.1.1 Support Vector MachinesSupport vector machines (SVM) are a technique to train classifiers (also developed for regression and density estimation problems) that is well founded in statistical learning theory (Vapnik, 1998). One of the main attractions of using SVM is that they are capable of learning in sparse, high-dimensional spaces with very few training examples. SVM accomplish this by minimizing a bound on the empirical error and the complexity of the classifier, at the same time. This controlling of both the training error and the classifier's complexity has allowed SVM to be successfully applied to very high dimensional learning tasks. For example, (Joachims, 1997) presents results on SVM applied to a 10,000 dimensional text categorization problem and (Osuna et al, 1997) show a 283 dimensional face detection system. We will make use of this property of being able to。

2024届山东中学联盟高三下学期5月预测热身卷英语试题+详细解析

2024届山东中学联盟高三下学期5月预测热身卷英语试题+详细解析注意事项:1. 答卷前，考生务必将自己的姓名、考生号等填写在答题卡和试卷指定位置。

2. 选择题的作答：选出每小题答案后，用2B铅笔把答题卡上对应题目的答案标号涂黑。

如需改动，用橡皮擦干净后，再选涂其他答案标号。

回答非选择题时，将答案写在答题卡上。

3. 考试结束后，将本试卷和答题卡一并交回。

第一部分：阅读理解（共两节，满分50分）第一节（共15小题；每小题2.5分，满分37.5分）阅读下列短文，从每题所给的四个选项（A、B、C和D）中，选出最佳选项。

ASmall Ways You Can Donate Money To CharityThere are plenty of innovative ways that you can help people in need, even when money is tight. Here are just a few unique ways to give.Food Angel, Hong KongFood insecurity has become a global problem for families. In Hong Kong, the people behind the Food Angel program collect 45 tonnes of edible surplus food each week that grocery stores, restaurants and individuals would otherwise dispose of. That includes fresh fruits and vegetables and other perishables (易腐烂的食物) that aren’t normally accepted in food-donation boxes.The impact is significant: Volunteers make and serve around 20,000 meals and distribute more than 11,000 other meals and food packs every day.Frigos Solidaires, FranceImagine if those in need could help themselves to food with anonymity (匿名) and dignity. Frigos Solidaires, or Solidarity Fridges, was started with that aim by Dounia Mebtoul, a young restaurateur in Paris. Now, 130 fridges installed in front of places such as shops and schools offer free food to the hungry across France.Stuff A Bus, CanadaIn Edmonton, the transit service parks vehicles in front of supermarkets for its annual “Stuff a Bus” campaign each November. Volunteers collect food and cash donations from shoppers to fill buses bound for food banks. Since its start in 1995, the campaign has collected 553,000 kilograms of food and roughly half a million dollars.Rice Bucket Challenge, IndiaHeard of the Ice Bucket Challenge? You take a video of yourself dumping a bucket of ice water over your head, then nominate (指定) three more people to do the same. In some versions, the participant donates $100 if they don’t complete the challenge.“I thought it was an amazing way to raise awareness of ALS and raise funds,” recall s Manju Kalanidhi, a journalist in Hyderabad, India. But it didn’t make sense in her country, where water is too precious to waste, even for a good cause. Then in 2014, it hit her: Why not make it a Rice Bucket Challenge to fight hunger? “I gave a bucket o f rice to someone in need and clicked a photo. I shared it on Facebook and said, ‘This is a Rice Bucket Challenge.Why don’t you do it, too?’” Participants donate a bucket of rice to an individual or family —no, it’s not dumped — take a photo and post it on social media with a message encouraging others to do the same.1. Which one can help people in need get food without hurting their pride?A. Food Angel, Hong KongB. Frigos Solidaires, FranceC. Stuff A Bus, CanadaD. Rice Bucket Challenge, India2. What do you know about Rice Bucket Challenge in India?A. It is an amazing way to raise awareness of ALS.B. It was inspired by the Ice Bucket Challenge.C. A bucket of rice is given and dumped.D. A bucket of water is donated for a good cause.3. What’s the p urpose of the text?A. To explain how important to help people in need.B. To inspire readers to start a non-profit organization.C. To introduce some creative ways to give away.D. To appeal to readers to donate money to charity.【答案】1. B 2. B 3. C【解析】【导语】本文是一篇说明文。

Protégé基本教程【Protégé5

Protégé基本教程【Protégé5.5.0版本】⽬录Q&A1. 为什么protege⾥⾯owlviz tab中所有的东西都缩在左上⾓？因为没有安装graphviz，⾸先在官⽹（）下载grahviz，下载好以后安装。

然后在protege⾥⾯点击file-preferences-owlviz⾥⾯，修改graphviz的地址就好了。

2. 怎么增加and关系（e.g. pizza and has topping）直接在这个框⾥⾯输⼊就⾏啦！⼀、前⾔参考⽂档：Protégé4OWL官⽅⼊门教程因为在⽹上看到的教程使⽤的Protégé版本⽐较⽼了，⽽且是英⽂的，这⾥做⼀个整理。

Protégé软件是斯坦福⼤学医学院⽣物信息研究中⼼基于Java语⾔开发的本体编辑和知识获取软件，或者说是本体开发⼯具，也是基于知识的编辑器，属于开放源代码软件。

这个软件主要⽤于语义⽹中本体的构建，是语义⽹中本体构建的核⼼开发⼯具，现在的最新版本为5.5.0版本。

Protégé提供了本体概念类，关系，属性和实例的构建，并且屏蔽了具体的本体描述语⾔，⽤户只需在概念层次上进⾏领域本体模型的构建。

（如果官⽹下载⽐较慢的话，我放⼀个百度⽹盘的链接在这⾥：）现在下载到的Protégé⼀般是⼀个压缩包，压缩包解压之后有Protege.exe和run.bat这两个⽂件，点击任何⼀个都可以打开Protégé。

Protégé⼀打开的界⾯主要是Active Ontology这个Tab的界⾯。

本体的名字可以在Ontology IRI⾥⾯修改。

Annotations是注释栏，可以对本体添加⼀些信息注释或者描述。

右边Ontology metrics会显⽰⼀些本体中相关元素的统计信息。

网络信息系统

网络信息系统
Web Information System
李春旺李宇 licw@ 电话：62539105 liy@ 电话：82629426
课程安排
课程内容考核形式第一章 WIS概论(1) 平时成绩 : 20% 第二章 XML(2) 最后大开卷: 80% 第三章 Web Services(2) 第四章 Semantics Web(1) 参考教材第五章 Web Mining(1) 《Web 信息系统导论》第六章 Web Search (1) 李广建编著，高等教育出第七章 Web Integration(1) 版社，2008 第八章 Web Mashup(1) 专题讨论(2) 考试
Work typically with well defined and closed data repository
WIS
Work typically with heterogeneous, dynamic and distributed data
一般与限定好的并且是封闭的数据库一起工作
Serve to well known and specific audience
Web2.0时代WIS 特点
开放性
开放标准：Open standard 开放数据：Linked open data （链接1 链接2）开放源码: Open sources 开放服务: Open services 信息交互从一对多转向多对多,边际同核心一样重要。控制和权力结构从中央集中式转向分散式、去中心化。需求驱动，用户参与。因为系统互联以及服务集成与嵌入，造成Web系统内容、功能之间的边界正在加速溶解。
(free) No cookies, no scripts, no frames, no web bugs 目前出500多卷

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Performance Metrics and Ontology for Describing Performance Data of GridWorkﬂows∗Hong-Linh Truong,Thomas Fahringer,Francesco NerieriInstitute for Computer Science,University of InnsbruckTechnikerstr.21A,A-6020Innsbruck,Austria{truong,tf,nero}@dps.uibk.ac.atSchahram DustdarInformation Systems Institute,Vienna University of TechnologyArgentinierstrasse8/184-1,A-1040Wien,Austriadustdar@infosys.tuwien.ac.atAbstractTo understand the performance of Grid workﬂows,perfor-mance analysis tools have to select,measure and analyzevarious performance metrics of the workﬂows.However,there is a lack of a comprehensive study of performancemetrics which can be used to evaluate the performance of aworkﬂow executed in the Grid.This paper presents perfor-mance metrics that performance monitoring and analysistools should provide during the evaluation of the perfor-mance of Grid workﬂows.Performance metrics are associ-ated with many levels of abstraction.We introduce an on-tology for describing performance data of Grid workﬂows.We describe how the ontology can be utilized for monitor-ing and analyzing the performance of Grid workﬂows.1.IntroductionRecently,increased interest can be witnessed in exploit-ing the potential of the Grid for workﬂows,especially forscientiﬁc workﬂows,e.g.[20,4,12].As the Grid is di-verse,dynamic and inter-organizational,the execution ofGrid workﬂows is veryﬂexible.This requires performancemonitoring and analysis tools to collect,measure and ana-lyze metrics that characterize the performance and depend-ability of workﬂows at many levels of detail in order to de-tect components that contribute to performance problems,and correlations between them.Figure1.Hierarchical structureview of a workﬂow.Figure2.Execution model of a workﬂow.Related work is outlined in Section6.We summarize the paper and give an outlook to the future work in Section7.2.Workﬂow Model2.1.Hierarchical Structure View of a WorkﬂowFigure1presents the hierarchical view of a workﬂow (WF).A WF consists of WF constructs.Each WF construct consists of a set of activities.Two activities can depend on each other.The dependency between two activities can be data dependency or control dependency.Each activity is associated with a set of invoked applications.Each invoked application contains a set of code regions.WF constructs can be fork-join,sequence,do loop,etc. More details of existing WF constructs can be found in [2].Each activity is associated with one or multiple in-voked application(s).An invoked application can be an ex-ecutable program(e.g.,an MPI program)or a service op-eration(e.g.,of Web Service).Invoked applications can be executed in sequential or parallel manner.An invoked application is considered as a set of code regions;a code region ranges from a single statement to an entire program unit.A code region can be a function call,a remote service call,a do loop,an if-then-else statement,etc.2.2.Workﬂow ExecutionA Grid environment is viewed as a set of Grid sites.A Grid site is comprised of a set of grid services within a single organization that is utilized as a single,uniﬁed computing service.A Grid site consists of a number of computational nodes(or hosts)that share a common se-curity domain,exchange data through a local network,is controlled by a single resource management service.A computational node can be any computing platform,from a single-processor workstation to an SMP(Symmetric Multi-Processor)to an MPP(Massively Parallel Processing)sys-tem.Each computational node may have single or multiple processor(s).On each computational node,there would be multiple application processes executed,each process may have multiple threads of execution.Figure2presents the execution sequence of a WF.The user submits a WF to the workﬂow management system (WfMS).The WfMS instantiates activities.When execut-ing an activity instance,the WfMS locates a Grid site and submits the invoked application of the activity instance to the scheduler of the Grid site.The Grid site scheduler lo-cates computational nodes and executes processes of the invoked application on corresponding nodes.2.3.Activities Execution ModelThe execution of an activity a is represented by the dis-crete process model[19].Let P(a)be a discrete process modeling the execution of activity a(hence,we call P(a) the execution status graph of an activity).A P(a)is a di-rected,acyclic,bipartite graph(S,E,A),in which S is a set of nodes representing activity states,E is a set of nodes representing activity events,and A is a set of edges repre-senting ordered pairs of activity state and event.Simply put,an agent(e.g.workﬂow invocation and control)causes an event(e.g.execute an activity)that changes the activity state(e.g.from queuing to processing),which in turn inﬂu-ences the occurrence and outcome of the future events(e.g. active,failed).Figure3presents an example of a discrete process modeling the execution of an activity.Each state s of an activity a is determined by two events: leading event e i,and ending event e j such that e i,e j∈E, s∈S,and(e i,s),(s,e j)∈A of P(a).To denote an event name of P(a)we use e name(a);Table1presents a few event names which can be used to describe activity events1.We use t(e)to refer to the timestamp of an event e and t now toFigure3.Discrete process model for the execution of an activity.represents a state, represents an event. Category DescriptionElapsedTimeUserCPUTimeSystemCPUTimeCPUTimeSerialTimeEncodingTimeCounter TCM,L2Hardware counters.The exact number of hardware counters is dependent on speciﬁc platforms.The number of executions of the code region.The number of executions of sub regions of the code region.The number of messages sent by the code region.The number of messages received by the code region.TotalCommTimeTotalTransSizeSynchronization Single-address space exclusive synchronization.Condition synchronization.MeanElapsedTimeCommPerCompMeanTransRateMeanTransSizeCacheMissRatio,MFLOPS,etc.Temporal overhead This type of metrics is deﬁned only for code regions of parallel programs.T able2.Performance metrics at code region level.Event Nameactivecompletedsuspendedfailedsubmitted2Elapsed time,wall-clock time,and response time indicate the latencyto complete a task(including IO,waiting time,computation,...).Theseterms are used interchangeably.In this paper,the term ElapsedTime refersto elapsed time or response time or wall-clock time.nization,etc.Various ratio metrics can be deﬁned based on execution time and counter metrics.If the invoked application is a parallel application(e.g., MPI and OpenMP applications),we can compute temporal overhead metrics for code regions.Overhead metrics are based on a classiﬁcation of temporal overheads for parallel programs[22].Examples of overhead metrics are control of parallelism,loss of parallelism,etc.3.2.Metrics at Invoked Application LevelMost performance metrics at code region level can be provided at invoked application level by using aggregate operators.Table3presents extra performance metrics as-sociated with invoked applications.Category DescriptionElapsedTimeExecDelayCounter The number of executions of the invokedapplication.SpeedupFactorElapsedTime j(A)(1) 3.3.Metrics at Activity LevelTable4presents metrics measured at activity level.Per-formance metrics can be associated with activities and ac-tivity instances.Execution time metrics includes end to end response time,processing time,queuing time,suspending time, etc.The processing time of an activity instance a, ProcessingTime(a),is deﬁned byProcessingTime(a)=t(e completed(a))−t(e active(a))(2) if e completed(a)has not occurred,it means the execution of a has not completed,processing time is deﬁned byProcessingTime(a)=t now−t(e active(a))(3) Synchronization metrics for an activity involves with the execution of other activities it depends.Let pred(a)be the set of the immediate predecessors of a;there is a data dependency or control dependency between a and any a i∈pred(a).∀a i∈pred(a);i=1,···,n;synchronization delay and execution delay from a i to a,SynDelay(a i,a)and ExecDelay(a i,a),respectively,are deﬁned by:SynDelay(a i,a)=t(e submitted(a))−t(e completed(a i))(4) ExecDelay(a i,a)=t(e active(a))−t(e completed(a i))(5)If e submitted(a)or e active(a)has not occurred,synchroniza-tion or execution delay will be computed based on t now.Metrics associated with an activity are determined from metrics of activity instances of the activity by using aggre-gate operators.Aggregated metrics of an activity give the summarized information about the performance of the ac-tivity that can be used to examine the overall performance of the activity.3.4.Metrics at Workﬂow Construct LevelFigure4.A fork-join workﬂow construct.Table5presents performance metrics at WF construct level.The load imbalance is associated with fork-join WF constructs.A fork-join WF construct is shown in Figure4. Load imbalance,LoadIm,is deﬁned byLoadIm(a i)=ProcessingTime(a i)−∑nk=1(ProcessingTime(a i))max n i=1(ProcessingTime n(a i))(7)where ProcessingTime n(a i)is the processing time of ac-tivity a i in the fork-join version with n activities and ProcessingTime1(a i)is the processing time of activity a i in the version with a single activity.Load imbalance and speedup factor metrics can also be computed for fork-join structures of structured block of activities.A structured block is a single-entry-single-exit block of activities.In this case,ProcessingTime n(a i)will be the processing time of a structured block in a version with n blocks.Let SG be a graph of WF construct C.Let P i=< a i1,a i2,···,a in>be a critical path from starting node to the ending node of of SG.The elapsed time of C,ElapsedTime(C),and the processing time of C, ProcessingTime(C),are deﬁned asElapsedTime(C)=n∑k=1ElapsedTime(a ik)(8) ProcessingTime(C)=n∑k=1ProcessingTime(a ik)(9)Category DescriptionElapsedTimeProcessingTimeQueuingTimeSuspendingTimeSharedResTimeCounter The number of invocations of an activity.Size of total data transfered to the activity.Size of total data transfered from the activity to another.ThroughputMeanTimePerStateTransRateSynchronization Synchronization delay.Execution delay.SlowdownFactorMetric NameExecution time The latency from the time the workﬂow construct starts until the time the workﬂow constructﬁnishes.The actually portion of elapsed time that the workﬂow construct spends on processing.RedundantActivityNIterationRatio The average elapsed time of an activity of the workﬂow construct.Percent of the selection of a path at a choice construct.LoadImPerformance improvement Speedup factor.RedundantProcessingProcessingTime(C h)(10) 3.5.Metrics at Workﬂow LevelTable6presents performance metrics of interest at WF level.Let P i=<a i1,a i2,···,a in>be a critical path from starting node to the ending node of a WF G.The elapsed time of G,ElapsedTime(G),and the processing time of G,ProcessingTime(G),are deﬁned based on Equation8 and9,respectively.Speedup factor of WF G over WF H, SpeedupFactor(G,H),is deﬁned bySpeedupFactor(G,H)=ProcessingTime(G)n(12) 3.6.Metric OntologyPerformance metrics introduced above are described in an ontology named WfMetricOnto.A metric is described by class WfMetric.Figure5presents the concept WfMetric.Figure5.Description of a WF performance metric. WfMetric hasﬁve properties:hasMetricName speciﬁes the metric name.Property hasSynonym speciﬁes other names of the performance metric.Property hasUnit speciﬁes the measurement unit of the metric.Property inLevel speciﬁes the level with which the metric is associated.Property has-Description explains the performance metric.4.Ontology for Describing Performance Data of Grid WorkﬂowsWe develop an ontology named WfPerfOnto for de-scribing performance data of workﬂows;WfPerfOnto is based on OWL[16].This section just outlines main classes and properties of WfPerfOnto shown in Figure6.Workﬂow describes the workﬂow(WF).A WF has WF constructs(represented by hasWorkﬂowConstruct prop-erty),WF graph,etc.A WF construct is described by Work-ﬂowConstruct.Each WF construct has activities(hasActiv-ity),activity instances(hasActivityInstance),WF construct graph,sub WF constructs,etc.Category DescriptionElapsedTimeProcessingTimeParTimeSeqTimeRatio Mean queuing time per elapsed time.Mean processing time of an activity.Mean queuing time of an activity.Time that a resource spends on processing work per elapsed time of the workﬂow.NAPerResProcInResLoadImResof a workﬂow namedFigure8.Part of WfPerfOnto for workﬂow Montage. Montage.5.Utilizing WfPerfOnto for PerformanceAnalysis of Grid Workﬂows5.1.Describing Performance DataA performance analysis tool can use WfPerfOnto to de-scribe performance data of a workﬂow.For example,whena client of the performance analysis service requests perfor-mance results of a workﬂow,the client can specify the re-quests based on WfPerfOnto(e.g.,by using RDQL[13]).The service can use WfPerfOnto to express performancemetrics of the workﬂow.As performance results are de-scribed in a well-deﬁned ontology,the client will easilyunderstand and utilize the performance results.Figure7presents an example of a workﬂow namedMontage3.Dependencies between activities are con-trol dependencies.Figure8represents part of the per-formance data of Montage described in WfPerfOnto.The performance experiment is executed on two resources.At the top-level,the workﬂow consists of two work-ﬂow constructs,a fork-join construct named ForkJoin2and a sequence construct named Seq.The fork-joinconstruct can be considered as two sequence constructsnamed Seq1ForkJoin2and Seq2ForkJoin2.Activ-ity mImgtbl2has two dependencies.Figure8presentssome interesting performance metrics associated withmImgtbl2such as ElapsedTime and SynDelay.Although WfPerfOnto does not describe(dynamic)monitoring data of resources on which invoked applicationsof a workﬂow are executed,from information described inWfPerfOnto,e.g.,activity events and resource identiﬁers,we can obtain(dynamic)monitoring data of resources fromFigure9.Agents process an analysis request.Figure10.RDQL query used to request synchroniza-tion delay of activity Project1.6.Related WorkMany techniques have been introduced to study quality of service and performance models of workﬂows,e.g.[11, 6,10].Performance metrics in[11,6]are associated with activities.We consider performance metrics in many levels of detail,e.g.code regions and workﬂow constructs. Recently,[14]discusses QoS metrics associated with Grid architecture layers.Our work studies performance metrics of Grid workﬂows.Existing tools supporting per-formance analysis of workﬂows,e.g.[17,3]have some common performance metrics with our metrics.However, our study covers a large set of performance metrics rang-ing from the workﬂow level to the code region level.[21] discusses the role of an ontology of QoS metrics for man-agement Web Services.An ontology for the speciﬁcation of QoS metrics for tasks and Web services is developed in [5].However,there is a lack of an ontology for describing performance metrics and performance data of Grid work-ﬂows.Recently,there is a growing effort on mining the work-ﬂow[24,9,8,7].Workﬂow activities are traced and log in-formation is used to discover the workﬂow model.Events logged,however,are only at activity level.Workﬂow min-ing focuses on discovery workﬂow model from tracing data where our study is to discuss important performance met-rics of workﬂows and methods to describe performance data of workﬂows.Workﬂow event logs can be used to analyze performance metrics proposed by our study.7.Conclusion and Future WorkThe performance and dependability of Grid workﬂows must be characterized by well-deﬁned performance met-rics.This paper presents a novel study of performance met-rics of Grid workﬂows.Performance metrics are associated with multiple levels of abstraction,ranging from a code re-gion to the whole workﬂow.We have presented an ontol-ogy for describing performance data of Grid workﬂows. We are currently reevaluating and enhancing the ontol-ogy for describing performance data of Grid workﬂows. Also we are extending the set of performance metrics.We are working on a prototype of a distributed analysis frame-work in which distributed analysis agents use WfPerfOnto based requests to exchange analysis tasks when conducting the performance analysis of Grid workﬂows.References[1]Worldﬂow Management Coalition:Terminology and glos-sary.technical report wfmc-tc-1011,feb1999.[2]W.M.P.Van Der Aalst,A.H.M.Ter Hofstede,B.Kie-puszewski,and A.P.Barros.Workﬂow patterns.Distrib.Parallel Databases,14(1):5–51,2003.[3]Andrea F.Abate,Antonio Esposito,Nicola Grieco,and Gi-ancarlo Nota.Workﬂow performance evaluation through wpql.In Proceedings of the14th international conference on Software engineering and knowledge engineering,pages 489–495.ACM Press,2002.[4]Junwei Cao,Stephen A.Jarvis,Subhash Saini,and Gra-ham R.Nudd.Gridﬂow:Workﬂow management for grid computing.In Proceedings of the3st International Sympo-sium on Cluster Computing and the Grid,page198.IEEE Computer Society,2003.[5]Jorge Cardoso and Amit Sheth.Semantic e-workﬂow com-position.J.Intell.Inf.Syst.,21(3):191–225,2003.[6]Jorge Cardoso,Amit P.Sheth,and John Miller.Workﬂowquality of service.In Proceedings of the IFIP TC5/WG5.12 International Conference on Enterprise Integration and Modeling Technique,pages303–311.Kluwer,B.V.,2003.[7]S.Dustdar,T.Hoffmann,and W.M.P.van der Aalst.Min-ing of ad-hoc Business Processes with TeamLog.Data and Knowledge Engineering,2005.[8]Walid Gaaloul,Sami Bhiri,and Claude Godart.Discoveringworkﬂow transactional behavior from event-based log.In CoopIS/DOA/ODBASE(1),pages3–18,2004.[9]Joachim Herbst and Dimitris Karagiannis.Workﬂow miningwith put.Ind.,53(3):245–264,2004. [10]Michael C.Jaeger,Gregor Rojec-Goldmann,and GeroM¨u hl.Qos aggregation for service composition using work-ﬂow patterns.In Proceedings of the8th International En-terprise Distributed Object Computing Conference(EDOC 2004),pages149–159,Monterey,California,USA,Septem-ber2004.IEEE CS Press.[11]Kwang-Hoon Kim and Clarence A.Ellis.Performance ana-lytic models and analyses for workﬂow r-mation Systems Frontiers,3(3):339–355,2001.[12]Sriram Krishnan,Patrick Wagstrom,and Gregor vonLaszewski.GSFL:A Workﬂow Framework for Grid Ser-vices.Technical report,Argonne National Laboratory,9700 S.Cass Avenue,Argonne,IL60439,U.S.A.,July2002. [13]RDQL:RDF Data Query Language./semweb/rdql.htm.[14]Daniel A.Menasce and Emiliano Casalicchio.Quality ofService Aspects and Metrics in Grid Computing.In Proc.2004Computer Measurement Group Conference,2004.[15]Montage..[16]OWL Web Ontology Language Reference./tr/owl-ref/.[17]Bastin Tony Roy Savarimuthu,Maryam Purvis,and MartinFleurke.Monitoring and controlling of a multi-agent based workﬂow system.In Proceedings of the second workshop on Australasian information security,Data Mining and Web Intelligence,and Software Internationalisation,pages127–132.Australian Computer Society,Inc.,2004.[18]Clovis Seragiotto,Hong-Linh Truong,Thomas Fahringer,Bernd Mohr,Michael Gerndt,and Tianchao Li.Standard-ized Intermediate Representation for Fortran,Java,C and C++Programs.Technical report,Institute for Software Sci-ence,University of Vienna,October2004.[19]John F.Sowa.Knowledge Representation:logical,philo-sophical,and compuational foundations.Brooks/Cole,Pa-ciﬁc Grove,CA,2000.[20]The Condor Team.Dagman(directed acyclic graph man-ager)./condor/dagman/.[21]Vladimir Tosic,Babak Esfandiari,Bernard Pagurek,andKruti Patel.On requirements for ontologies in management of web services.In Revised Papers from the International Workshop on Web Services,E-Business,and the Semantic Web,pages237–247.Springer-Verlag,2002.[22]Hong-Linh Truong and Thomas Fahringer.SCALEA:APerformance Analysis Tool for Parallel Programs.Concur-rency and Computation:Practice and Experience,15(11-12):1001–1025,2003.[23]Hong-Linh Truong and Thomas Fahringer.PerformanceAnalysis,Data Sharing and Tools Integration in Grids:New Approach based on Ontology.In Proceedings of Interna-tional Conference on Computational Science(ICCS2004), LNCS3038,pages424–431,Krakow,Poland,Jun7-9 2004.Springer-Verlag.[24]Wil van der Aalst,Ton Weijters,and Laura Maruster.Work-ﬂow mining:Discovering process models from event logs.IEEE Transactions on Knowledge and Data Engineering, 16(9):1128–1142,2004.。