Data-Driven Regular Reconfigurable Arrays Design Space Exploration and Mapping

合集下载

pride 内存数据库使用说明

PRIDE:A Data Abstraction Layer for Large-Scale2-tier Sensor NetworksWoochul Kang University of Virginia Email:wk5f@Sang H.SonUniversity of VirginiaEmail:son@John A.StankovicUniversity of VirginiaEmail:stankovic@Abstract—It is a challenging task to provide timely access to global data from sensors in large-scale sensor network applica-tions.Current data storage architectures for sensor networks have to make trade-offs between timeliness and scalability. PRIDE is a data abstraction layer for2-tier sensor networks, which enables timely access to global data from the sensor tier to all participating nodes in the upper storage tier.The design of PRIDE is heavily inﬂuenced by collaborative real-time ap-plications such as search-and-rescue tasks for high-rise building ﬁres,in which multiple devices have to collect and manage data streams from massive sensors in cooperation.PRIDE achieves scalability,timeliness,andﬂexibility simultaneously for such applications by combining a model-driven full replication scheme and adaptive data quality control mechanism in the storage-tier. We show the viability of the proposed solution by implementing and evaluating it on a large-scale2-tier sensor network testbed. The experiment results show that the model-driven replication provides the beneﬁt of full replication in a scalable and controlled manner.I.I NTRODUCTIONRecent advances in sensor technology and wireless connec-tivity have paved the way for next generation real-time appli-cations that are highly data-driven,where data represent real-world status.For many of these applications,data streams from sensors are managed and processed by application-speciﬁc devices such as PDAs,base stations,and micro servers.Fur-ther,as sensors are deployed in increasing numbers,a single device cannot handle all sensor streams due to their scale and geographic distribution.Often,a group of such devices need to collaborate to achieve a common goal.For instance,during a search-and-rescue task for a buildingﬁre,while PDAs carried byﬁreﬁghters collect data from nearby sensors to check the dynamic status of the building,a team of suchﬁreﬁghters have to collaborate by sharing their locally collected real-time data with peerﬁreﬁghters since each individualﬁreﬁghter has only limited information from nearby sensors[1].The building-wide situation assessment requires fusioning data from all(or most of)ﬁreﬁghters.As this scenario shows,lots of future real-time applications will interact with physical world via large numbers of un-derlying sensors.The data from the sensors will be managed by distributed devices in cooperation.These devices can be either stationary(e.g.,base stations)or mobile(e.g.,PDAs and smartphones).Sharing data,and allowing timely access to global data for each participating entity is mandatory for suc-cessful collaboration in such distributed real-time applications.Data replication[2]has been a key technique that enables each participating entity to share data and obtain an understanding of the global status without the need for a central server. In particular,for distributed real-time applications,the data replication is essential to avoid unpredictable communication delays[3][4].PRIDE(Predictive Replication In Distributed Embedded systems)is a data abstraction layer for devices performing collaborative real-time tasks.It is linked to an application(s) at each device,and provides transparent and timely access to global data from underlying sensors via a scalable and robust replication mechanism.Each participating device can transparently access the global data from all underlying sen-sors without noticing whether it is from local sensors,or from remote sensors,which are covered by peer devices. Since global data from all underlying sensors are available at each device,queries on global spatio-temporal data can be efﬁciently answered using local data access methods,e.g.,B+ tree indexing,without further communication.Further,since all participating devices share the same set of data,any of them can be a primary device that manages a sensor.For example,when entities(either sensor nodes or devices)are mobile,any device that is close to a sensor node can be a primary storage node of the sensor node.Thisﬂexibility via decoupling the data source tier(sensors)from the storage tier is very important if we consider the highly dynamic nature of wireless sensor network applications.Even with these advantages,the high overhead of repli-cation limits its applicability[2].Since potentially a vast number of sensor streams are involved,it is not generally possible to propagate every sensor measurement to all devices in the system.Moreover,the data arrival rate can be high and unpredictable.During critical situations,the data rates can signiﬁcantly increase and exceed system capacity.If no corrective action is taken,queues will form and the laten-cies of queries will increase without bound.In the context of centralized systems,several intelligent resource allocation schemes have been proposed to dynamically control the high and unpredictable rate of sensor streams[5][6][7].However, no work has been done in the context of distributed and replicated systems.In this paper,we focus on providing a scalable and robust replication mechanism.The contributions of this paper are: 1)a model-driven scalable replication mechanism,which2signiﬁcantly reduces the overall communication and computation overheads,2)a global snapshot management scheme for efﬁcientsupport of spatial queries on global data,3)a control-theoretic quality-of-data management algo-rithm for robustness against unpredictable workload changes,and4)the implementation and evaluation of the proposed ap-proach on a real device with realistic workloads.To make the replication scalable,PRIDE provides a model-driven replication scheme,in which the models of sensor streams are replicated to peer storage nodes,instead of data themselves.Once a model for a sensor stream is replicated from a primary storage node of the sensor to peer nodes,the updates from the sensor are propagated to peer nodes only if the prediction from the current model is not accurate enough. Our evaluation in Section5shows that this model-driven approach makes PRIDE highly scalable by signiﬁcantly re-ducing the communication/computation overheads.Moreover, the Kalmanﬁlter-based modeling technique in PRIDE is light-weight and highly adaptable because it dynamically adjusts its model parameters at run-time without training.Spatial queries on global data are efﬁciently supported by taking snapshots from the models periodically.The snapshot is an up-to-date reﬂection of the monitored situation.Given this fresh snapshot,PRIDE supports a rich set of local data orga-nization mechanisms such as B+tree indexing to efﬁciently process spatial queries.In PRIDE,the robustness against unpredictable workloads is achieved by dynamically adjusting the precision bounds at each node to maintain a proper level of system load,CPU utilization in particular.The coordination is made among the nodes such that relatively under-loaded nodes synchronize their precision bound with an relatively overloaded node. Using this coordination,we ensure that the congestion at the overloaded node is effectively resolved.To show the viability of the proposed approach,we imple-mented a prototype of PRIDE on a large-scale testbed com-posed of Nokia N810Internet tablets[8],a cluster computer, and a realistic sensor stream generator.We chose Nokia N810 since it represents emerging ubiquitous computing platforms such as PDAs,smartphones,and mobile computers,which will be expected to interact with ubiquitous sensors in the near future.Based on the prototype implementation,we in-vestigated system performance attributes such as communica-tion/computation loads,energy efﬁciency,and robustness.Our evaluation results demonstrate that PRIDE takes advantage of full replication in an efﬁcient,highly robust and scalable manner.The rest of this paper is organized as follows.Section2 presents the overview of PRIDE.Section3presents the details of the model-driven replication.Section4discusses our pro-totype implemention,and Section5presents our experimental results.We present related work in Section6and conclusions in Section7.II.O VERVIEW OF PRIDEA.System ModelFig.1.A collaborative application on a2-tier sensor network. PRIDE envisions2-tier sensor network systems with a sensor tier and a storage tier as shown in Figure1.The sensor tier consists of a large number of cheap and simple sensors;S={s1,s2,...,s n},where s i is a sensor.Sensors are assumed to be highly constrained in resources,and per-form only primitive functions such as sensing and multi-hop communication without local storage.Sensors stream data or events to a nearest storage node.These sensors can be either stationary or mobile;e.g.,sensors attached to aﬁreﬁghter are mobile.The storage tier consists of more powerful devices such as PDAs,smartphones,and base stations;D={d1,d2,...,d m}, where d i is a storage node.These devices are relatively resource-rich compared with sensor nodes.However,these devices also have limited resources in terms of processor cycles,memory,power,and bandwidth.Each storage node provides in-network storage for underlying sensors,and stores data from sensors in its vicinity.Each node supports multiple radios;an802.11radio to connect to a wireless mesh network and a802.15.4to communicate with underlying sensors.Each node in this tier can be either stationary(e.g.,base stations), or mobile(e.g.,smartphones and PDAs).The sensor tier and the storage tier have loose coupling; the storage node,which a sensor belongs to,can be changed dynamically without coordination between the two tiers.This loose coupling is required in many sensor network applications if we consider the highly dynamic nature of such systems.For example,the mobility of sensors and storage nodes makes the system design very complex and inﬂexible if two tiers are tightly coupled;a complex group management and hand-off procedure is required to handle the mobility of entities[9]. Applications at each storage node are linked to the PRIDE layer.Applications issue queries to underlying PRIDE layer either autonomously,or by simply forwarding queries from external users.In the search-and-rescue task example,each storage node serves as both an in-network data storage for nearby sensors and a device to run autonomous real-time applications for the mission;the applications collect data by issuing queries and analyzing the situation to report results to theﬁreﬁghter.Afterwards,a node refers to a storage node if it is not explicitly stated.3Fig.2.The architecture of PRIDE(Gray boxes).age ModelIn PRIDE,all nodes in the storage tier are homogeneous in terms of their roles;no asymmetrical function is placed on a sub-group of the nodes.All or part of the nodes in the storage tier form a replication group R to share the data from underlying sensors,where R⊂D.Once a node joins the replication group,updates from its local sensors are propagated to peer nodes;conversely,the node can receive updates from remote sensors via peer nodes.Any storage node,which is receiving updates directly from a sensor,becomes a primary node for the sensor,and it broadcasts the updates from the sensor to peer nodes.However,it should be noted that,as will be shown in Section3,the PRIDE layer at each node performs model-driven replication,instead of replicating sensor data,to make the replication efﬁcient and scalable.PRIDE is characterized by the queries that it supports. PRIDE supports both temporal queries on each individual sensor stream and spatial queries on current global data.Tem-poral queries on sensor s i’s historical data can be answered using the model for s i.An example of temporal query is “What is the value of sensor s i5minutes ago?”For spatial queries,each storage node provides a snapshot on the entire set of underlying sensors(both local and remote sensors.)The snapshot is similar to a view in database ing the snapshot,PRIDE provides traditional data organization and access methods for efﬁcient spatial query processing.The access methods can be applied to any attributes,e.g.,sensor value,sensor ID,and location;therefore,value-based queries can be efﬁciently supported.Basic operations on the access methods such as insertion,deletion,retrieval,and the iterating cursors are supported.Special operations such as join cursors for join operations are also supported by making indexes to multiple attributes,e.g.,temperature and location attributes. This join operation is required to efﬁciently support complex spatial queries such as“Return the current temperatures of sensors located at room#4.”III.PRIDE D ATA A BSTRACTION L AYERThe architecture of PRIDE is shown in Figure2.PRIDE consists of three key components:(i)ﬁlter&prediction engine,which is responsible for sensor streamﬁltering,model update,and broadcasting of updates to peer nodes,(ii)query processor,which handles queries on spatial and temporal data by using a snapshot and temporal models,respectively,and (iii)feedback controller,which determines proper precision bounds of data for scalability and overload protection.A.Filter&Prediction EngineThe goals ofﬁlter&prediction engine are toﬁlter out updates from local sensors using models,and to synchronize models at each storage node.The premise of using models is that the physical phenomena observed by sensors can be captured by models and a large amount of sensor data can be ﬁltered out using the models.In PRIDE,when a sensor Input:update v from sensor s iˆv=prediction from model for s i;1if|ˆv−v|≥δthen2broadcast to peer storage nodes;3update data for s i in the snapshot;4update model m i for s i;5store to cache for later temporal query processing;6else7discard v(or store for logging);8end9Algorithm2:OnUpdateFromPeer.stream s i is covered by PRIDE replication group R,each storage node in R maintains a model m i for s i.Therefore, all storage nodes in R maintain a same set of synchronized models,M={m1,m2,...,m n},for all sensor streams in underlying sensor tier.Each model m i for sensor s i are synchronized at run-time by s i’s current primary storage node (note that s i’s primary node can change during run-time because of the network topology changes either at sensor tier or storage tier).Algorithms1and2show the basic framework for model synchronization at a primary node and peer nodes,respec-tively.In Algorithm1,when an update v is received from sensor s i to its primary storage node d j,the model m i is looked up,and a prediction is made using m i.If the gap between the predicted value from the model,ˆv,and the sensor update v is less than the precision boundδ(line2),then the new data is discarded(or saved locally for logging.)This implies that the current models(both at the primary node and the peer nodes)are precise enough to predict the sensor output with the given precision bound.However,if the gap is bigger than the precision bound,this implies that the model cannot capture the current behavior of the sensor output.In this case, m i at the primary node is updated and v is broadcasted to all peer nodes(line3).In Algorithm2,as a reaction to the broadcast from d j,each peer node receives a new update v and updates its own model m i with v.The value v is stored in local caches at all nodes for later temporal query processing.4As shown in the Algorithms,the communication among nodes happens only when the model is not precise enough. Models,Filtering,and Prediction So far,we have not discussed a speciﬁc modeling technique in PRIDE.Several distinctive requirements guide the choice of modeling tech-nique in PRIDE.First,the computation and communication costs for model maintenance should be low since PRIDE han-dles a large number of sensors(and corresponding models for each sensor)with collaboration of multiple nodes.The cost of model maintenance linearly increases to the number of sensors. Second,the parameters of models should be obtained without an extensive learning process,because many collaborative real-time applications,e.g.,a search-and-rescue task in a building ﬁre,are short-term and deployed without previous monitoring history.A statistical model that needs extensive historical data for model training is less applicable even with their highly efﬁcientﬁltering and prediction performance.Finally, the modeling should be general enough to be applied to a broad range of applications.Ad-hoc modeling techniques for a particular application cannot be generally used for other applications.Since PRIDE is a data abstraction layer for wide range of collaborative applications,the generality of modeling is important.To this end,we choose to use Kalmanﬁlter [10][6],which provides a systematic mechanism to estimate past,current,and future state of a system from noisy measure-ments.A short summary on Kalmanﬁlter follows.Kalman Filter:The Kalmanﬁlter model assumes the true state at time k is evolved from the state at(k−1)according tox k=F k x k−1+w k;(1) whereF k is the state transition matrix relating x k−1to x k;w k is the process noise,which follows N(0,Q k);At time k an observation z k of the true state x k is made according toz k=H k x k+v k(2) whereH k is the observation model;v k is the measurement noise,which follows N(0,R k); The Kalmanﬁlter is a recursive minimum mean-square error estimator.This means that only the estimated state from the previous time step and the current measurement are needed to compute the estimate for the current and future state. In contrast to batch estimation techniques,no history of observations is required.In what follows,the notationˆx n|m represents the estimate of x at time n given observations up to,and including time m.The state of aﬁlter is deﬁned by two variables:ˆx k|k:the estimate of the state at time k givenobservations up to time k.P k|k:the error covariance matrix(a measure of theestimated accuracy of the state estimate). Kalmanﬁlter has two distinct phases:Predict and Update. The predict phase uses the state estimate from the previous timestep k−1to produce an estimate of the state at the next timestep k.In the update phase,measurement information at the current timestep k is used to reﬁne this prediction to arrive at a new more accurate state estimate,again for the current timestep k.When a new measurement z k is available from a sensor,the true state of the sensor is estimates using the previous predictionˆx k|k−1,and the weighted prediction error. The weight is called Kalman gain K k,and it is updated on each prediction/update cycle.The true state of the sensor is estimated as follows,ˆx k|k=ˆx k|k−1+K k(z k−H kˆx k|k−1).(3)P k|k=(I−K k H k)P k|k−1.(4) The Kalman gain K k is updated as follows,K k|k=P k|k−1H T k(H k P k|k−1H T k+R k).(5) At each prediction step,the next state of the sensor is predicted by,ˆx k|k−1=F kˆx k−1|k−1.(6) Example:For instance,a temperature sensor can be described by the linear state space,x k= x dxdtis the derivative of the temperature with respect to time.As a new(noisy)measurement z k arrives from the sensor1,the true state and model parameters are estimated by Equations3-5.The future state of the sensor at(k+1)th time step after∆t can be predicted using the Equation6, where the state transition matrix isF= 1∆t01 .(7) It should be noted that the parameters for Kalmanﬁlter,e.g., K and P,do not have to be accurate in the beginning;they can be estimated at run-time and their accuracy improves gradually by having more sensor measurements.We do not need massive past data for modeling at deployment time.In addition,the update cycle of Kalmanﬁlter(Equations3-5) is performed at all storage nodes when a new measurement is broadcasted as shown in Algorithm1(line5)and Algorithm2 (line2).No further communication is required to synchronize the parameters of the models.Finally,as will be shown in Section5,the prediction/update cycle of Kalmanﬁlter incurs insigniﬁcant overhead to the system.1Note that the temperature component of zk is directly acquired from the sensor,and dx5B.Query ProcessorThe query processor of PRIDE supports both temporal queries and spatial queries with planned extension to support spatio-temporal queries.Temporal Queries:Historical data for each sensor stream can be processed in any storage node by exploiting data at the local cache and linear smoother[10].Unlike the estimation of current and future states using one Kalmanﬁlter,the optimized estimation of historical data(sometimes called smoothing) requires two Kalmanﬁlters,a forwardﬁlterˆx and a backward ﬁlterˆx b.Smoothing is a non-real-time data processing scheme that uses all measurements between0and T to estimate the state of a system at a certain time t,where0≤t≤T(see Figure3.)The smoothed estimateˆx(t|T)can be obtained as a linear combination of the twoﬁlters as follows.ˆx(t|T)=Aˆx(t)+A′ˆx(t)b,(8) where A and A′are weighting matrices.For detailed discus-sion on smoothing techniques using Kalmanﬁlters,the reader is referred to[10].Fig.3.Smoothing for temporal query processing.Spatial Queries:Each storage node maintains a snapshot for all underlying local and remote sensors to handle queries on global spatial data.Each element(or data object)of the snapshot is an up-to-date value from the corresponding sensor.The snapshot is dynamically updated either by new measurements from sensors or by models2.The Algorithm1 (line4)and Algorithm2(line1)show the snapshot updates when a new observation is pushed from a local sensor and a peer node,respectively.As explained in the previous section, there is no communication among storage nodes when models well represent the current observations from sensors.When there is no update from peer nodes,the freshness of values in the snapshot deteriorate over time.To maintain the freshness of the snapshot even when there is no updates from peer nodes,each value in the snapshot is periodically updated by its local model.Each storage node can estimate the current state of sensor s i using Equation6without communication to the primary storage node of s i.For example,a temperature after30seconds can be predicted by setting∆t of transition matrix in Equation7to30seconds.The period of update of data object i for sensor s i is determined,such that the precision boundδis observed. Intuitively,when a sensor value changes rapidly,the data object should be updated more frequently to make the data object in the snapshot valid.In the example of Section3.1.1, 2Note that the data structures for the snapshot such as indexes are also updated when each value of the snapshot is updated.the period can be dynamically estimated as follows:p[i]=δ/dxdtis the absolute validity interval(avi)before the data object in the snapshot violates the precision bound,which is±δ.The update period should be as short as the half of the avi to make the data object fresh[11].Since each storage node has an up-to-date snapshot,spatial queries on global data from sensors can be efﬁciently han-dled using local data access methods(e.g.,B+tree)without incurring further communication delays.(a)δ=5C(b)δ=10CFig.4.Varying data precision.Figure4shows how the value of one data object in the snapshot changes over time when we apply different precision bounds.As the precision bound is getting bigger,the gap be-tween the real state of the sensor(dashed lines)and the current value at the snapshot(solid lines)increases.In the solid lines, the discontinued points are where the model prediction and the real measurement from the sensor are bigger than the precision bound,and subsequent communication is made among storage nodes for model synchronization.For applications and users, maintaining the smaller precision bound implies having a more accurate view on the monitored situation.However, the overhead also increases as we have the smaller precision bound.Given the unpredictable data arrival rates and resource constraints,compromising the data quality for system sur-vivability is unavoidable in many situations.In PRIDE,we consider processor cycles as the primary limited resource,and the resource allocation is performed to maintain the desired CPU utilization.The utilization control is used to enforce appropriate schedulable utilization bounds of applications can be guaranteed despite signiﬁcant uncertainties in system work-loads[12][5].In utilization control,it is assumed that any cycles that are recovered as a result of control in PRIDE layer are used sensibly by the scheduler in the application layer to relieve the congestion,or to save power[12][5].It can also enhance system survivability by providing overload protection against workloadﬂuctuation.Speciﬁcation:At each node,the system speciﬁcation U,δmax consists of a utilization speciﬁcation U and the precision speciﬁcationδmax.The desired utilization U∈[0..1]gives the required CPU utilization not to overload the system while satisfying the target system performance6 such as latency,and energy consumption.The precisionspeciﬁcationδmax denotes the maximum tolerable precision bound.Note there is no lower bound on the precision as in general users require a precision bound as short as possible (if the system is not overloaded.)Local Feedback Control to Guarantee the System Spec-iﬁcation:Using feedback control has shown to be very effec-tive for a large class of computing systems that exhibit unpre-dictable workloads and model inaccuracies[13].Therefore,to guarantee the system speciﬁcation without a priori knowledge of the workload or accurate system model we apply feedbackcontrol.Fig.5.The feedback control loop.The overall feedback control loop at each storage node is shown in Figure5.Let T is the sampling period.The utilization u(k)is measured at each sampling instant0T,1T,2T,...and the difference between the target utilization and u(k)is fed into the ing the difference,the controller computes a local precision boundδ(k)such that u(k)converges to U. Theﬁrst step for local controller design is modeling the target system(storage node)by relatingδ(k)to u(k).We model the the relationship betwenδ(k)and u(k)by using proﬁling and statistical methods[13].Sinceδ(k)has higher impact on u(k)as the size of the replication group increases, we need different models for different sizes of the group. We change the number of members of the replication group exponentially from2to64and have tuned a set ofﬁrst order models G n(z),where n∈{2,4,8,16,32,64}.G n(z)is the z-transform transfer function of theﬁrst-order models,in which n is the size of the replication group.After the modeling, we design a controller for the model.We have found that a proportional integral(PI)controller[13]is sufﬁcient in terms of providing a zero steady-state error,i.e.,a zero difference between u(k)and the target utilization bound.Further,a gain scheduling technique[13]have been used to apply different controller gains for different size of replication groups.For instance,the gain for G32(z)is applied if the size of a replication group is bigger than24and less than or equal to48. Due to space limitation we do not provide a full description of the design and tuning methods.Coordination among Replication Group Members:If each node independently sets its own precision bound,the net precision bound of data becomes unpredictable.For example, at node d j,the precision bounds for local sensor streams are determined by d j itself while the precision bounds for remote sensor streams are determined by their own primary storage nodes.PRIDE takes a conservative approach in coordinating stor-age nodes in the group.As Algorithm3shows,the global precision bound for the k th period is determined by taking the maximum from the precision bounds of all nodes in theInput:myid:my storage id number/*Get localδ.*/1measure u(k)from monitor;2calculateδmyid(k)from local controller;3foreach peer node d in R−˘d myid¯do4/*Exchange localδs.*/5/*Use piggyback to save communication cost.*/ 6sendδmyid(k)to d;7receiveδi(k)from d;8end9/*Get theﬁnal globalδ.*/10δglobal(k)=max(δi(k)),where i∈R;11。

序列密码非线性反馈移位寄存器

身份验证：非线性反馈移位寄存器可以用于生成动态口令，用于身份验证和授权控制
PART.6
总结
总结
序列密码-非线性反馈移位寄存器是一种高效、安全、易于实现的密码学模块，广泛应用于各种安全应用场景
在未来，随着对安全性和性能需求的不断提高，非线性反馈移位寄存器的研究和应用将进一步深化和拓展
PART.4
非线性反馈移位寄存器的优点
非线性反馈移位寄存器的优点
非线性反馈移位寄存器的优点包括
非线性反馈移位寄存器的优点
01
产生的密钥序列具有较高的复杂性和不可预测性：因此具有较强的安全性
02
非线性反馈移位寄存器的设计可以灵活地适应不同的安全需求和性能要求
03
非线性反馈移位寄存器的实现简单：易于大规模生产
非线性反馈移位寄存器的未来研究方向
形式化验证和测试
形式化验证和测试是确保密码学模块安全性和正确性的重要手段。需要进一步研究和开发更为高效、准确的形式化验证和测试方法，对非线性反馈移位寄存器进行更为严格的验证和测试，以确保其安全性和正确性
PART.8
总结
总结
序列密码-非线性反馈移位寄存器是一种重要的密码学模块，具有广泛的反馈移位寄存器的未来研究方向
随着互联网和物联网技术的不断发展，需要适应新的应用场景，研究和开发更为高效、安全、灵活的非线性反馈移位寄存器，以满足各种新的安全需求
非线性反馈移位寄存器的未来研究方向
轻量级设计
随着移动设备和物联网设备的普及，需要研究和设计更为轻量级的非线性反馈移位寄存器，以降低功耗和成本，适应各种资源受限的设备和应用场景
2
非线性反馈移位寄存器 (Nonlinear Feedback Shift Register，NFSR)是用于生成序列密码的常见模

华为 OceanStor Dorado 5000 6000 全闪存存储系统产品介绍说明书

Huawei OceanStor Dorado 5000/6000are mid-range storage systems in the OceanStor Dorado all-flash series,and are designed to provide excellent data service experience for enterprises.Both products are equipped with innovative hardware platform,intelligent FlashLink®algorithms,and an end-to-end (E2E)NVMe architecture,ensuring the storage systems deliver a 30%higher performance than the previous generation,and achieve the latency down to just 0.05ms.The intelligent algorithms are built into the storage system to make storage more intelligent during the application operations.Furthermore,the five-level reliability design ensures the continuity of core business.Excelling in scenarios such as OLTP/OLAP databases,server virtualization,VDI,and resource consolidation,OceanStor Dorado 5000/6000all-flash systems are smart choices for medium and large enterprises,and have already been widely adopted in the finance,government,healthcare,education,energy,and manufacturing fields.The storage systems are ready to maximize your return on investment (ROI)and benefit diverse industries.OceanStor Dorado 5000/6000All-Flash Storage Systems 30%higher performance than theprevious generation E2E NVMe for 0.05ms of ultra-low latencyFlashLink®intelligent algorithmsSCM intelligent cache acceleration for 60%lower latencyDistributed file system with 30%higher performanceLeading Performance withInnovative Hardware✓The intelligent multi-protocol interface module hosts the protocol parsing previously performed by the general-purpose CPU, expediting the front-end access performance by 20%.✓The computing platform offers industry-leading performance with 25% higher computing power than the industry average.✓The intelligent accelerator module analyzes and understands I/O rules of multiple application models based 3-layer intelligent management:•365-day capacity trends prediction •60-day performance bottleneck prediction •14-day disk fault prediction •Immediate solutions for 93%ofproblemsSAN&NAS convergence,storage and computing convergence,and cross-gen device convergence for efficient resource utilizationFlashEver:No data migration over 10years for 3-gen systems Efficient O&M with IntelligentEdge-Cloud SynergyComponent reliability :Wear leveling and anti-wear levelingArchitecture and product reliability :0data loss in the event of failures of controllers,disk enclosures,or three disksSolution and cloud reliability :The industry's only A-A solution for SAN and NAS,geo-redundant 3DC solution,and gateway-free cloud backup Always-On Applications with5-Layer ReliabilityProduct Features Ever Fast Performance with Innovative Hardware Innovative hardware platform: The hardware platform of Huawei storage enables E2E data acceleration, improving the system performance by 30% compared to the previousgeneration.on machine learning frameworks to implement intelligentprefetching of memory space. This improves the read cache hit ratio by 50%.✓SmartCache+ SCM intelligent multi-tier caching identify whether or not the data is hot and uses different media tostore it, reducing the latency by 60% in OLTP (100% reads) scenarios.✓The intelligent SSD hosts the core Flash Translation Layer (FTL) algorithm, accelerating data access in SSDs andreducing the write latency by half.✓The intelligent hardware has a built-in Huawei storage fault library that accelerates component fault location anddiagnosis, and shortens the fault recovery time from 2hours to just 10 minutes.Intelligent algorithms: Most flash vendors lack E2E innate capabilities to ensure full performance from their SSDs. OceanStor Dorado 5000/6000 runs industry-leading FlashLink® intelligent algorithms based on self-developed controllers, disk enclosures, and operating systems.✓Many-core balancing algorithm: Taps into the many-core computing power of a controller to maximize the dataprocessing capability.✓Service splitting algorithm: Offloads reconstruction services from the controller enclosure to the smart SSD enclosure to ease the load pressure of the controller enclosure for moreefficient I/O processing.✓Cache acceleration algorithm: Accelerates batch processing with the intelligent module to bring intelligence to storagesystems during application operations.The data layout between SSDs and controllers is coordinated synchronously.✓Large-block sequential write algorithm: Aggregates multiple discrete data blocks into a unified big data blockfor disk flushing, reducing write amplification and ensuringstable performance.✓Independent metadata partitioning algorithm: Effectively controls the performance compromise caused by garbagecollection for stable performance.✓I/O priority adjustment algorithm: Ensures that read and write I/Os are always prioritized, shortening the accesslatency.FlashLink® intelligent algorithms give full play to all flash memory and help Huawei OceanStor Dorado achieve unparalleled performance for a smoother service experience.E2E NVMe architecture for full series: All-flash storage has been widely adopted by enterprises to upgrade existing ITsystems, but always-on service models continue to push IT system performance boundaries to a new level. Conventional SAS-based all-flash storage cannot break the bottleneck of 0.5 ms latency. NVMe all-flash storage, on the other hand, is a future-proof architecture that implements direct communication between the CPU and SSDs, shortening the transmission path. In addition, the quantity of concurrencies is increased by 65,536 times, and the protocol interaction is reduced from four times to two, which doubles the write request processing. Huawei is a pioneer in adopting end-to-end NVMe architecture across the entire series. OceanStor Dorado 5000/6000 all-flash systems use the industry-leading 32 Gb FC-NVMe/100 Gb RoCE protocols at the front end and adopt Huawei-developed link-layer protocols to implement failover within seconds and plug-and-play, thus improving the reliability and O&M. It also uses a 100 Gb RDMA protocol at the back end for E2E data acceleration. This enables latency as low as 0.05 ms and 10x faster transmission than SAS all-flash storage. Globally shared distributed file system: The OceanStor Dorado 5000/6000 all-flash storage systems support the NAS function and use the globally shared distributed file systems to ensure ever-fast NAS performance. To make full use of computing power, the many-core processors in a controller process services concurrently. In addition, intelligent data prefetching and layout further shorten the access latency, achieving over 30% higher NAS performance than the industry benchmark.Linear increase of performance and capacity: Unpredictable business growth requires storage to provide simple linear increases in performance as more capacity is added to keep up with ever-changing business needs. OceanStor Dorado5000/6000 support the scale-out up to 16 controllers, and IOPS increases linearly as the quantity of controller enclosures increases, matching the performance needs of the future business development.Efficient O&M with Intelligent Edge-Cloud Synergy Extreme convergence: Huawei OceanStor Dorado 5000/6000 all-flash storage systems provide multiple functions to meet diversified service requirements, improve storage resource utilization, and effectively reduce the TCO. The storage systems provide both SAN and NAS services and support parallel access, ensuring the optimal path for dual-service access. Built-in containers support storage and compute convergence, reducing IT construction costs, eliminating the latency between servers and storage, and improving performance. The convergence of cross-generation devices allows data to flow freely, simplifying O&M and reducing IT purchasing costs.On and off-cloud synergy: Huawei OceanStor Dorado5000/6000 all-flash systems combine general-purpose cloud intelligence with customized edge intelligence over a built-inintelligent hardware platform, providing incremental training and deep learning for a personalized customer experience. The eService intelligent O&M and management platform collects and analyzes over 190,000 device patterns on the live network in real time, extracts general rules, and enhances basic O&M. Intelligence throughout service lifecycle: Intelligent management covers resource planning, provisioning, system tuning, risk prediction, and fault location, and enables 60-day and 14-day predictions of performance bottleneck and disk faults respectively, and immediate solutions for 93% of problems detected.FlashEver: The intelligent flexible architecture implements component-based upgrades without the need for data migration within 10 years. Users can enjoy latest-generation software and hardware capabilities without investing again in the related storage software features.Always-On Applications with 5-Layer Reliability Industries such as finance, manufacturing, and carriers are upgrading to intelligent service systems to meet the strategy of sustainable development. This will likely lead to diverse services and data types that require better IT architecture. Huawei OceanStor Dorado all-flash storage is an ideal choice for customers who need robust IT systems that consolidate multiple types of services for stable, always on services. It ensures end-to-end reliability at all levels, from component, architecture, product, solution, all the way to cloud, supporting data consolidation scenarios with 99.9999% availability. Benchmark-Setting 5-Layer ReliabilityComponent –SSDs: Reliability has always been a top concern in the development of SSDs, and Huawei SSDs are a prime example of this. Leveraging global wear-leveling technology, Huawei SSDs can balance their loads for a longer lifespan of each SSD. In addition, Huawei's patented anti-wear leveling technology prevents simultaneous multi-SSD failures and improves the reliability of the entire system.Architecture –fully interconnected design: Huawei OceanStor Dorado 5000/6000 adopt the intelligent matrix architecture (multi-controller) within a fully symmetric active-active (A-A) design to eliminate single points of failure and achieve high system availability. Application servers can access LUNs through any controller, instead of just a single controller. Multiple controllers share workload pressure using the load balancing algorithm. If a controller fails, other controllers take over services smoothly without any service interruption. Product –enhanced hardware and software: Product design is a systematic process. Before a stable storage system is commercially released, it must ensure that it meets the demands from both software and hardware, and can faultlesslyhost key enterprise applications. The OceanStor Dorado5000/6000 are equipped with hardware that adopts a fully redundant architecture and supports dual-port NVMe and hot swap, preventing single points of failure. The innovative 9.5 mm palm-sized SSDs and biplanar orthogonal backplane design provide 44% higher capacity density and 25% improved heat dissipation capability, and ensure stable operations of 2U 36-slot SSD enclosures. The smart SSD enclosure is the first ever to feature built-in intelligent hardware that offloads reconstruction from the controller to the smart SSD enclosure. Backed up by RAID-TP technology, the smart SSD enclosure can tolerate simultaneous failures of three SSDs and reconstruct 1 TB of data within 25 minutes. In addition, the storage systems offer comprehensive enterprise-grade features, such as 3-second periodic snapshots, that set a new standard for storage product reliability.Solution –gateway-free active-active solution: Flash storage is designed for enterprise applications that require zero data loss or zero application interruption. OceanStor Dorado5000/6000 use a gateway-free A-A solution for SAN and NAS to prevent node failures, simplify deployment, and improve system reliability. In addition, the A-A solution implements A-A mirroring for load balancing and cross-site takeover without service interruption, ensuring that core applications are not affected by system breakdown. The all-flash systems provide the industry's only A-A solution for NAS, ensuring efficient, reliable NAS performance. They also offer the industry's firstall-IP active-active solution for SAN, which uses long-distance RoCE transmission to improve performance by 50% compared with traditional IP solutions. In addition, the solution can be smoothly upgraded to the geo-redundant 3DC solution for high-level data protection.Cloud –gateway-free cloud DR*: Traditional backup solutions are slow, expensive, and the backup data cannot be directly used. Huawei OceanStor Dorado 5000/6000 systems provide a converged data management solution. It improves the backup frequency 30-fold using industry-leading I/O-level backup technology, and allows backup copies to be directly used for development and testing. The disaster recovery (DR) and backup are integrated in the storage array, slashing TCO of DR construction by 50%. Working with HUAWEI CLOUD and Huawei jointly-operated clouds, the solution achieves gateway-free DR and DR in minutes on the cloud.Technical SpecificationsModel OceanStor Dorado 5000 OceanStor Dorado 6000 Hardware SpecificationsMaximum Number ofControllers3232Maximum Cache (DualControllers, Expanding withthe Number of Controllers)256 GB-8 TB 1 TB-16 TBSupported Storage Protocols FC, iSCSI, NFS*, CIFS*Front-End Port Types8/16/32 Gbit/s FC/FC-NVMe*, 10/25/40/100 GbE, 25/100 Gb NVMe over RoCE*Back-End Port Types SAS 3.0/ 100 Gb RDMAMaximum Number of Hot-Swappable I/O Modules perController Enclosure12Maximum Number of Front-End Ports per ControllerEnclosure48Maximum Number of SSDs3,2004,800SSDs 1.92 TB/3.84 TB/7.68 TB palm-sized NVMe SSD,960 GB/1.92 TB/3.84 TB/7.68 TB/15.36 TB SASSSDSCM Supported800 GB SCM*Software SpecificationsSupported RAID Levels RAID 5, RAID 6, RAID 10*, and RAID-TP (tolerates simultaneous failures of 3 SSDs)Number of LUNs16,38432,768Value-Added Features SmartDedupe, SmartVirtualization, SmartCompression, SmartMigration, SmartThin,SmartQoS(SAN&NAS), HyperSnap(SAN&NAS), HyperReplication(SAN&NAS),HyperClone(SAN&NAS), HyperMetro(SAN&NAS), HyperCDP(SAN&NAS), CloudBackup*,SmartTier*, SmartCache*, SmartQuota(NAS)*, SmartMulti-Tenant(NAS)*, SmartContainer* Storage ManagementSoftwareDeviceManager UltraPath eServicePhysical SpecificationsPower Supply SAS SSD enclosure: 100V–240V AC±10%,192V–288V DC,-48V to -60V DCController enclosure/Smart SAS diskenclosure/Smart NVMe SSD enclosure: 200V–240V AC±10%, 100–240V AC±10%,192V–288V DC, 260V–400V DC,-48V to -60V DC SAS SSD enclosure: 100V–240V AC±10%, 192V–288V DC, -48V to -60V DC Controller enclosure/Smart SAS SSD enclosure/Smart NVMe SSD enclosure: 200V–240V AC±10%, 192V–288V DC, 260V–400V DC,-48V to -60V DCTechnical SpecificationsModel OceanStor Dorado 5000 OceanStor Dorado 6000 Physical SpecificationsDimensions (H x W x D)SAS controller enclosure: 86.1 mm ×447mm ×820 mmNVMe controller enclosure: 86.1 mm ×447mm ×920 mmSAS SSD enclosure: 86.1 mm ×447 mm ×410 mmSmart SAS SSD enclosure: 86.1 mm x 447mm x 520 mmNVMe SSD enclosure: 86.1 mm x 447 mm x620 mmSAS controller enclosure: 86.1 mm ×447mm ×820 mmNVMe controller enclosure: 86.1 mm ×447mm ×920 mmSAS SSD enclosure: 86.1 mm ×447 mm ×410 mmSmart SAS SSD enclosure: 86.1 mm ×447mm ×520 mmNVMe SSD enclosure: 86.1 mm x 447 mm x620 mmWeight SAS controller enclosure: ≤ 45 kgNVMe controller enclosure: ≤ 50 kgSAS SSD enclosure: ≤ 20 kgSmart SAS SSD enclosure: ≤ 30 kgSmart NVMe SSD enclosure: ≤ 35 kg SAS controller enclosure: ≤ 45 kg NVMe controller enclosure: ≤ 50 kg SAS SSD enclosure: ≤ 20 kgSmart SAS SSD enclosure: ≤ 30 kg Smart NVMe SSD enclosure: ≤ 35 kgOperating Temperature–60 m to +1800 m altitude: 5°C to 35°C (bay) or 40°C (enclosure)1800 m to 3000 m altitude: The max. temperature threshold decreases by 1°C for everyaltitude increase of 220 mOperating Humidity10% RH to 90% RHCopyright © Huawei Technologies Co., Ltd. 2021. All rights reserved.No part of this document may be reproduced or transmitted in any form or by any means without the prior written consent of Huawei Technologies Co., Ltd.Trademarks and Permissions, HUAWEI, and are trademarks or registered trademarks of Huawei Technologies Co., Ltd. Other trademarks, product, service and company names mentioned are the property of their respective holders.Disclaimer THE CONTENTS OF THIS MANUAL ARE PROVIDED "AS IS". EXCEPT AS REQUIRED BY APPLICABLE LAWS, NO WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIEDWARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE, ARE MADE IN RELATION TOTHE ACCURACY, RELIABILITY OR CONTENTS OF THIS MANUAL.TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, IN NO CASE SHALL HUAWEI TECHNOLOGIESCO., LTD BE LIABLE FOR ANY SPECIAL, INCIDENTAL, INDIRECT, OR CONSEQUENTIAL DAMAGES, OR LOSTPROFITS, BUSINESS, REVENUE, DATA, GOODWILL OR ANTICIPATED SAVINGS ARISING OUT OF, OR INCONNECTION WITH, THE USE OF THIS MANUAL.Tel: + S h e n z h en 518129,P.R.C h i n aBantian Longgang DistrictHUAWEI TECHNOLOGIES CO.,LTD.To learn more about Huawei storage, please contact your local Huawei officeor visit the Huawei Enterprise website: .Huawei Enterprise APPHuawei IT。

机器学习与数据挖掘笔试面试题

What is a decision tree? What are some business reasons you might want to use a decision tree model? How do you build a decision tree model? What impurity measures do you know? Describe some of the different splitting rules used by different decision tree algorithms. Is a big brushy tree always good? How will you compare aegression? Which is more suitable under different circumstances? What is pruning and why is it important? Ensemble models: To answer questions on ensemble models here is a :
Why do we combine multiple trees? What is Random Forest? Why would you prefer it to SVM? Logistic regression: Link to Logistic regression Here's a nice tutorial What is logistic regression? How do we train a logistic regression model? How do we interpret its coefficients? Support Vector Machines A tutorial on SVM can be found and What is the maximal margin classifier? How this margin can be achieved and why is it beneficial? How do we train SVM? What about hard SVM and soft SVM? What is a kernel? Explain the Kernel trick Which kernels do you know? How to choose a kernel? Neural Networks Here's a link to on Coursera What is an Artificial Neural Network? How to train an ANN? What is back propagation? How does a neural network with three layers (one input layer, one inner layer and one output layer) compare to a logistic regression? What is deep learning? What is CNN (Convolution Neural Network) or RNN (Recurrent Neural Network)? Other models: What other models do you know? How can we use Naive Bayes classifier for categorical features? What if some features are numerical? Tradeoffs between different types of classification models. How to choose the best one? Compare logistic regression with decision trees and neural networks. and What is Regularization? Which problem does Regularization try to solve? Ans. used to address the overfitting problem, it penalizes your loss function by adding a multiple of an L1 (LASSO) or an L2 (Ridge) norm of your weights vector w (it is the vector of the learned parameters in your linear regression). What does it mean (practically) for a design matrix to be "ill-conditioned"? When might you want to use ridge regression instead of traditional linear regression? What is the difference between the L1 and L2 regularization? Why (geometrically) does LASSO produce solutions with zero-valued coefficients (as opposed to ridge)? and What is the purpose of dimensionality reduction and why do we need it? Are dimensionality reduction techniques supervised or not? Are all of them are (un)supervised? What ways of reducing dimensionality do you know? Is feature selection a dimensionality reduction technique? What is the difference between feature selection and feature extraction? Is it beneficial to perform dimensionality reduction before fitting an SVM? Why or why not? and Why do you need to use cluster analysis? Give examples of some cluster analysis methods? Differentiate between partitioning method and hierarchical methods. Explain K-Means and its objective? How do you select K for K-Means?

R数据分析常用包与函数

【收藏】R数据分析常用包与函数2016-09-26R语言作为入门槛较低的解释性编程语言，受到从事数据分析，数据挖掘工作人员的喜爱，在行业排名中一直保持较高的名次（经常排名第一），下面列出了可用于数据分析、挖掘的R包和函数的集合。

1、聚类常用的包：fpc，cluster，pvclust，mclust基于划分的方法: kmeans, pam, pamk, clara基于层次的方法: hclust, pvclust, agnes, diana基于模型的方法: mclust基于密度的方法: dbscan基于画图的方法: plotcluster, plot.hclust基于验证的方法: cluster.stats2、分类常用的包：rpart，party，randomForest，rpartOrdinal，tree，marginTree，maptree，survival决策树: rpart, ctree随机森林: cforest, randomForest回归, Logistic回归, Poisson回归: glm, predict, residuals生存分析: survfit, survdiff, coxph3、关联规则与频繁项集常用的包：arules：支持挖掘频繁项集，最大频繁项集，频繁闭项目集和关联规则DRM：回归和分类数据的重复关联模型APRIORI算法，广度RST算法：apriori, drmECLAT算法：采用等价类，RST深度搜索和集合的交集：eclat4、序列模式常用的包：arulesSequencesSPADE算法：cSPADE5、时间序列常用的包：timsac时间序列构建函数：ts成分分解: decomp, decompose, stl, tsr6、统计常用的包：Base R, nlme方差分析: aov, anova假设检验: t.test, prop.test, anova, aov线性混合模型：lme主成分分析和因子分析：princomp7、图表条形图: barplot饼图: pie散点图: dotchart直方图: hist箱线图boxplotQQ图: qqnorm, qqplot, qqlineBi-variate plot: coplot树图: rpartParallel coordinates: parallel, paracoor, parcoord热图, contour: contour, filled.contour其他图: stripplot, sunflowerplot, interaction.plot, matplot, fourfoldplot, assocplot, mosaicplot8、数据操作缺失值：na.omit变量标准化：scale变量转置：t抽样：sample其他：aggregate, merge, reshape。

rrt自主探索原理

rrt自主探索原理
RRT（Rapidly-exploring Random Tree）是一种常用的自主探索算法，其原理如下：
1. 初始化：将起始点作为根节点，创建一棵树。

2. 生成随机点：在搜索空间中随机生成一个点。

3. 寻找最近节点：在树中找到最接近随机点的节点。

4. 扩展节点：从最接近的节点出发，沿着随机点方向移动一小步，生成一个新的节点。

5. 碰撞检测：检测新节点与障碍物是否有碰撞。

如果有碰撞，则舍弃该点。

6. 链接节点：将新节点与最接近的节点连接。

7. 判断目标是否可达：判断新节点是否接近目标点。

如果新节点接近目标点，则将新节点作为目标节点插入到树中，并返回成功；否则，回到步骤2。

8. 终止条件：当达到设定的最大迭代次数或无法继续生成新节点时，终止搜索。

RRT算法通过不断生成新节点并与现有的节点连接，逐步扩展搜索树，直到达到目标点或无法继续扩展为止。

通过随机生
成点来探索搜索空间，增加搜索范围，能够较好地应对复杂的环境。

同时，利用碰撞检测可以避免生成与障碍物相交的节点，提升路径规划的安全性。

configdatalocationresolver 枚举

coderinfo.storageclass的枚举量coderinfo.storageclass枚举量用于表示存储类别。

该枚举量有以下值：1.STANDARD：标准存储类别，具有较低的成本和较高的延迟。

2.STANDARD_IA：低频访问存储类别，具有较低的成本和较高的延迟。

3.NEARLINE：冷存储类别，具有最低的成本和最高的延迟。

4.ARCHIVE：归档存储类别，具有最低的成本和最高的延迟。

以下是每个存储类别的详细说明：●STANDARD：STANDARD存储类别是Amazon S3中最常用的存储类别。

它具有较低的成本和较高的延迟，适用于需要随时访问的常规数据。

●STANDARD_IA：STANDARD_IA存储类别是Amazon S3的低频访问存储类别。

它具有较低的成本和较高的延迟，适用于需要定期访问，但不经常访问的数据。

●NEARLINE：NEARLINE存储类别是Amazon S3的冷存储类别。

它具有最低的成本和最高的延迟，适用于需要不经常访问的数据。

●ARCHIVE：ARCHIVE存储类别是Amazon S3的归档存储类别。

它具有最低的成本和最高的延迟，适用于需要长期存储的数据。

在使用coderinfo.storageclass枚举量时，可以通过以下方式指定存储类别：1.使用枚举量的名称：例如，coderinfo.storageclass=STANDARD。

2.使用枚举量的值：例如，coderinfo.storageclass=0。

以下是使用coderinfo.storageclass枚举量的示例：✧import coderinfo✧指定存储类别为STANDARD✧coderinfo.storageclass=coderinfo.STORAGECLASS.STANDARD✧指定存储类别为STANDARD_IA✧coderinfo.storageclass=coderinfo.STORAGECLASS.STANDARD_IA✧指定存储类别为NEARLINE✧coderinfo.storageclass=coderinfo.STORAGECLASS.NEARLINE✧指定存储类别为ARCHIVE✧coderinfo.storageclass=coderinfo.STORAGECLASS.ARCHIVE。

csr读取稀疏矩阵

csr读取稀疏矩阵1.引言1.1 概述概述部分的内容可以包括CSR格式和稀疏矩阵的基本概念和定义。

概述:CSR (Compressed Sparse Row) 格式是一种常用的稀疏矩阵存储格式，它通过压缩稀疏矩阵的行索引、列索引和非零元素值的方式，实现了对大规模稀疏矩阵的高效存储和读取。

稀疏矩阵是指矩阵中大部分元素为零的矩阵，与之相对的是稠密矩阵，即大部分元素都不为零。

在很多应用领域，如图像处理、机器学习和科学计算等，稀疏矩阵的表示和处理是非常重要的。

CSR格式最早由Yale大学的George K. Francis教授在1972年提出，在数值计算和稀疏矩阵计算领域得到了广泛应用。

相比于其他常见的稀疏矩阵存储格式，如COO (Coordinate) 格式和CSC (Compressed Sparse Column) 格式，CSR格式具有更高的存储效率和读取速度。

它通过将矩阵的非零元素按行顺序存储，并将每行的非零元素的列索引和值分别保存在两个数组中，大大节省了存储空间。

此外，CSR格式还支持快速的稀疏矩阵向量乘法等操作，是许多线性代数计算中的重要基础。

本文将介绍CSR格式在读取稀疏矩阵中的应用。

首先，我们将详细介绍CSR格式的原理和数据结构，并对其进行比较分析，了解其优势和不足之处。

然后，我们将探讨稀疏矩阵的定义和特点，从理论上认识稀疏矩阵的结构和特性，为后续的稀疏矩阵读取提供基础。

最后，我们将在结论部分总结CSR格式在读取稀疏矩阵中的优势和应用场景。

通过本文的阅读，读者将能够全面了解CSR格式在读取稀疏矩阵中的原理和应用，为进一步的研究和应用提供参考。

希望本文能够对读者在稀疏矩阵的处理和应用中起到一定的帮助和指导作用。

1.2文章结构文章结构是指文章的组织方式和章节的排列顺序。

一个良好的文档结构可以使读者更好地理解文章的主要内容和观点。

本文按照以下结构来组织：1. 引言1.1 概述1.2 文章结构1.3 目的2. 正文2.1 CSR格式介绍2.2 稀疏矩阵的定义和特点3. 结论3.1 CSR格式在读取稀疏矩阵中的应用3.2 总结在引言部分，我们将对文章要介绍的内容进行概述，简要说明CSR格式和稀疏矩阵，以及本文的目的。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Data-Driven Regular Reconfigurable Arrays: DesignSpace Exploration and Mapping*Ricardo Ferreira1, João M. P. Cardoso2,3, Andre Toledo1, and Horácio C. Neto3,41 Departamento de Informática, Universidade Federal de Viçosa,Viçosa 36570 000, Brazil, cacau@dpi.ufv.br2 Universidade do Algarve, Campus de Gambelas, 8000-117, Faro, Portugaljmpc@3 INESC-ID, 1000-029, Lisboa, Portugal4 Instituto Superior Técnico, Lisboa, Portugal, hcn@inesc.ptAbstract. This work presents further enhancements to an environment forexploring coarse grained reconfigurable data-driven array architectures suitableto implement data-stream applications. The environment takes advantage ofJava and XML technologies to enable architectural trade-off analysis. Theflexibility of the approach to accommodate different topologies andinterconnection patterns is shown by a first mapping scheme. Three benchmarksfrom the DSP scenario, mapped on hexagonal and grid architectures, are used tovalidate our approach and to establish comparison results.1. IntroductionRecently, array processor architectures have been proposed as extensions of microprocessor-based systems (see, e.g., [1], [2]). Their use to execute streaming applications leads to acceleration and/or energy consumption savings, both important for today and future embedded systems. Since many design decisions must be taken in order to implement an efficient architecture for a given set of applications, environments to efficiently experiment with different architectural features are fundamental. Array architectures may rely on different computational models. Architectures behaving in a static dataflow fashion [3][4] are of special interest, as they naturally process data streams, and therefore provide a very promising solution for stream-based computations, which are becoming predominant in many application areas [5]. In addition, the control flow can be distributed and can easily handle data-streams even in the presence of irregular latency times. In the data-driven model, synchronization can be achieved by ready-acknowledge protocols, the centralized control units are not needed, and the operations are dynamically scheduled by data flow. Furthermore, array architectures are scalable due to the regular design and symmetric structure connections. Moreover, high parallelism, energy consumption savings, circuit reliability and a short design cycle can be also reached by adopting reconfigurable, regular, data-driven array architectures [6]. However, many array * Ricardo Ferreira acknowledges the financial support from Ct-Energia/CNPq, CAPES and FAPEMIG, Brazil.architectures seem to be designed without strong evidences for the architectural decisions taken. Remarkably the work presented in [7][8] has been one of the few exceptions which addressed the exploration of architectural features (in this case, a number of KressArray [4] properties).Our previous work presented a first step to build an environment to test and simulate data-driven array architectures [9]. To validate the concept we have presented results exploiting the size of input/output FIFOs for a simple example. As shown, the simulations are fast enough to allow the exploration of a significant number of design decisions. Our work aims to support a broad range of data-driven based arrays, a significant set of architecture parameters, and then evaluate its trade-offs using representative benchmarks. The environment will help the designer to systematically investigate different data-driven array architectures (topologies and connection patterns), as well as internal PE parameters (existence of FIFOs in PE input/outputs and their size, number of input/outputs of each PE, pipeline stages in each PE, etc.), and to conduct experiments to evaluate a number of characteristics (e.g., protocol overhead, node activity, etc.). An environment capable to exploit an important set of features is of great interest since it can provide an important aid on the design of new data-driven array architectures suitable to execute a set of kernels for specific application domains. The main contributions of this paper are:−the integration in the environment of a first mapping scheme;−the attainment of mapping results on grid and hexagonal arrays for three DSP benchmarks;This paper is structured as follows. The following section briefly introduces the environment. Section 3 explains the mapping scheme. Section 4 shows examples and experimental results. Finally, section 5 concludes the paper and discusses ongoing and future work.2. The EnvironmentA global view of our design exploration environment is shown in Fig. 1. The start point is the dataflow specification1 which is written in XML. XML is also used to specify the coarse-grained, data-driven, array architecture, and the placement and routing. Each dataflow operator is directly implemented with a Functional Unit (FU). The environment uses Java to specify each FU behavior and to perform the dataflow and array modeling, simulation and mapping. For simulating either the array architecture or the specific design, we use the Hades simulation tool [10], which has been extended with a data-driven library. Note that we are interested on modeling and exploring data-driven array architectures in which, for a specific implementation of an algorithm, the PE operations and interconnections between them are statically defined2. Our environment supports two simulation flows (Dflow and Aflow in Fig.1):1 A dataflow model can be automatically generated by a compiler from the input program in an imperative programming language [18][17].2 TRIPS [11] is an example of an architecture with interconnections dynamically defined.−In Dflow, the dataflow representation is translated to a Hades Design and simulated. Dflow provides an estimation of the optimal performance (e.g., achievable when implementing an ASIC based architecture) provided that full balancing is used (i.e., FIFOs of enough size are used). It permits a useful comparison with implementations in a reconfigurable data-driven array, since it represents the optimal achievable performance using a specific set of FUs (akin to the FUs existent in each PE of the array).−In Aflow, the dataflow representation is mapped to a data-driven array architecture, specified by a template, and is simulated with Hades. For design analysis, an user may specify, in the dataflow and array architecture descriptions, which parameters should be reported by the simulator engine. Those parameters can be the interconnect delay, the handshake protocol overhead, the operator activity, etc. As some experimental results show, the simulation and the mapping is fast enough to conduct a significant number of experiments.Fig. 1. Environment for Exploration of Data-Driven Array Architectures (EDA). Note that the Place and Route phase still needs further work and the front-end compiler is planned as future workA typical FU integrates an ALU, a multiplier or divider, input/output FIFOs, and the control unit to implement the handshake mechanism (see Fig. 2a). The FU is the main component of each PE in the array architecture. A PE consists on an FU embedded on an array cell, which has a regular neighbor pattern implemented by local interconnections (see Fig. 2b and Fig. 2c). A ready/acknowledge based protocol controls the data transfer between FUs or PEs. An FU computes a new value when all required inputs are available and previous results have been consumed. When an FU finishes computation, an acknowledge signal is sent back to all inputs and the next data tokens can be received. Each FU, besides traditional data-driven operators [3], may also implement the SE-PAR and PAR-SE operators introduced in [12] [13]. These operators resemble mux and demux operators without external control. The canonical form of PAR-SE has two inputs (A and B) and one output (X). It repeatedly outputs to X the input in either A and B, in an alternating fashion. The canonical form of SE-PAR has one input (A) and two outputs (X and Y). The operator repeatedly alternates data on the input to either X or Y. Note, however, that PAR-SE and SE-PAR can have more than two inputs and two outputs, respectively. They can be used to reduce array resources and to fully decentralize the control structure needed. Theseoperations provide an efficient way of sharing resources whenever needed (e.g., interface to an input/output port, interface to memory ports, etc.). SE-PAR and PAR-SE operations with more than two outputs and two inputs, respectively, can be implemented by a tree of basic SE-PAR and PAR-SE.Each FU may have input/output FIFOs, which can be efficient structures to handle directly unbalanced paths. Parameters such as protocol delay, FIFO size, FU granularity are global in the context of an array but can be local when a specific dataflow implementation is the goal. At this level, an FU behavior and the communication protocol are completely independent of the array architecture. They are specified as Java classes, which provide an easy way to write an incremental FU library and then to model and to simulate a pure dataflow, as well as a novel architecture.The properties of the target architecture such as the array topology, the interconnection network, and the PE’s placement, are specified using XML-based languages, which provide an efficient way to explore different array architectures.ready(a) (b) Fig. 2. (a) Data-driven functional unit (FU) with two inputs and one output; (b) Hexagonal cell with the FU; (c) Hexagonal array (FUs may have FIFOs in their input/output ports)We can use the graphical user interface of Hades to perform interactive simulation at Dflow or Aflow. Fig. 3 shows a screenshot with a hexagonal array simulation and the respective waveforms.Fig. 3. A hexagonal array interactive simulation using Hades3. A Flexible Mapping ApproachAn array processor can be defined as a set of PEs connected with different topologies. Our goal is to allow the exploration of data-driven architectures, which can have a mesh, a hexagonal or any other interconnection network. Previous works [14][15] have addressed the mapping of data-driven algorithms on regular array architectures.A mapping algorithm for a regular hexagonal array architecture has been proposed in[14], with a fixed interconnection degree. On the other hand, most array architectures are based on grid topology [15]. Our approach differs from the previous ones in two significant ways: (a) it is currently able to compare hexagonal and grid topologies; (b) it presents an object-oriented mapping scheme to model different patterns; (c) it is flexible enough to accommodate other mapping algorithms.Our object-oriented mapping scheme also takes advantages of Java and XML technologies to enable a portable and flexible implementation. The scheme provides an easy way of modeling grid, hexagonal, octal, as well as others topologies. The implementation is based on three main classes: the Array, the PE, and the Edge. The Array class implements the place and routing algorithm. The array and neighbor parameters (e.g., number and their positions) are specified using PE classes. Finally, the Border class models the number and type of connections between neighbors. Examples of PE and Edge classes are represented in Fig. 4. Each PE defines the number of borders with the neighbors, with each border having the input/output connections defined by the Edge class. At the moment the scheme does not accept different Edges.A PE can be connected to N-hop neighbors (see Fig. 5a) and can have in, out and/or in-out connections (see Fig. 5b). In a 0-hop pattern, each PE is connected to the immediate neighbors. In a 1-hop pattern, each PE is connected to the immediate neighbors and 1-hop, i.e., the PEs that can be reached by traversing through one neighbor PE. For instance, the nodes 1 and 3 are 1-hop neighbors in Fig. 5a.Fig. 4. Main classes of the mapping scheme: (a) PE classes, each one with different edges; (b) Edge classes (three parameters are used: number of input, output, and input/output connections Two versions of a first mapping algorithm have been developed (versions PR1 and PR2). They are based on the greedy algorithm presented in [14]. Albeit simple, they enabled us to explore different connection patterns and different topologies. The mapping algorithm is divided in place and route steps. The algorithm starts by placing the nodes of the dataflow graph (i.e., assign a PE for each node of the graph) based on layers and then optimizes the placement based on the center mass forces. The PR1 version adds NOP nodes in the input dataflow graph before starting the optimization phase. After placement, the route step tries to connect the nodes using incremental routing (e.g., each path of the original DFG is constructed by routing from one PE to one of its neighbors).The infrastructure has been developed to easily integrate different mapping algorithms. Future work will address a more advanced mapping algorithm. We plan to add critical path sensibility and to include path balancing, which can be very important in array architectures without or with small size FIFOs. A scheme to deal with heterogeneous array elements (e.g., only some PEs with support for multiplication) should also be researched. Notice also that the arrays currently being explored do not permit the use of a PE to implement more than one operation of the DFG. Arrays including this feature require node compression schemes as has been used in [14] for hexagonal arrays.Grid0 hop, 1 in-out Grid1 hop, 1 in-outHexagonal0 hop, 1in, 1 out Hexagonal 0 hop, 2 in - o ut (a)(b) Fig. 5. Different topologies supported by the mapping phase: (a) 0-hop and 1-hop Grid Topology (b) Uni-directional and Bi-directional Neighbor Connection4. Experimental ResultsWe have used the current prototype environment to perform some experiments. In the examples presented, we have used 32-bit width FUs and a 4-phase asynchronous handshake mechanism. All executions and simulations have been done in a Pentium 4 (at 1.8 GHz, 1 GB of RAM, with Linux).ExamplesAs benchmarks we use three DSP algorithms: FIR, CPLX, and FDCT. FIR is a finite-impulse response filter. CPLX is a FIR filter using complex arithmetic. FDCT is a fast discrete cosine transform implementation. The last two benchmarks are based on theC code available in [16]. For the experiments, we manually translated the input algorithms to a dataflow representation. The translation has been done bearing in mind optimization techniques that can be included in a compiler from a software programming language (e.g., C) to data-driven representations (see, e.g., [17][18] for details about compilation issues).For the data-driven implementation of the FDCT example (see part of source code in Fig. 6a) we used the SLP (self loop pipelining) technique with SE-PAR and PAR-SE operators [13][12]. See the block diagram in Fig. 6b. An SE-PAR tree splits the matrix input stream in 8 parallel elements. Then, the inner loop operations are performed concurrently, and finally, a PAR-SE tree merges the results into a unique matrix output stream. Notice that the SE-PAR and PAR-SE operators are also used here to share the computational structures of the two loops of the FDCT.I_1 = 0;For (i=0; i < N; i++) { // verticalFor (j=0;j<8;j++) {f0 = dct[0+i_1]; ... f7=dct[56+i_1];g0 = f0+f7; ......../* loop body */buf[0+i_1]=F0; ....buf[56+i_1]=F7;i_1++;}i_1 += 56;}i_1 = 0;For (i=0; i < 8*N; i++) { // horizontalf0= buf[0+i_1]; ... f7= buf[7+i_1];g0= f0+f7; ......../*loop body*/out[0+i_1]=F0; .... out[7+i_1]=F7;i_1+= 8;}(a)Fig. 6. FDCT: (a) source code based on the C code available in [16]; (b) possible FDCT implementation using the SLP technique and sharing the loop body resources between vertical and horizontal traversalResultsTable 1 shows the number of resources needed (number of FUs) and the results obtained by simulating implementations of the FIR and CPLX filters, with different number of taps, and two parts of the FDCT (FDCTa is related to the vertical traversal and FDCTb is related to the horizontal traversal) and the complete FDCT. In these experiments FIFOs (size between 1 and 3) in the inputs and outputs of each FU are used to achieve the maximum throughput. The CPU time to perform each simulation has been between 1 to 4 seconds for 1,024 input data items.Table 1. Properties related to the implementations of three DSP benchmarksEx#FU#copy#ALU#MULT#SE-PAR #PAR-SE#I/O Averageactivity(ALU+MULT)Averageactivity(all)Max ILP(ALU+MULT)MaxILP(all)FIR-2 7 1 2 2002 1.00 1.00 4 7 FIR-4 13 3 4 4002 1.00 1.008 13 FIR-8 25 7 8 8002 1.00 1.0016 25 FIR-16 49 15 16 16002 1.00 1.0032 49 CPLX4 22 5 8 24120.700.868 18 CPLX8 46 13 18 48120.680.8216 38 FDCTa 92 26 36 147720.120.1810 23 FDCTb 102 26 46 147720.120.1612 25 FDCT 136 26 52 14212120.200.2622 49The average activity columns show the percentage of time in which an FU performs an operation. The maximum activity (i.e., 1.00) is reached when the FU activity is equal to the input stream rate. We present average activities taking into account only ALU+MULT operations and all the operations. The maximum ILP (instruction level parallelism) shows the maximum number of FUs executing at a certain time step. Once again, we present ILP results for ALU+MULT and for alloperations. As we can see for FIR and CPLX the maximum ILP is approximately equal to the number of FUs, which depicts that all the FUs are doing useful work almost all the time. With FDCT, the maximum ILP is high (22 for ALU+MULT operations and 49 for all operations, considering the complete example) but many FUs are used only small fractions of the total execution time. We believe this can be improved by using SE-PAR and PAR-SE operators with more outputs and inputs, respectively. For instance, in the hexagonal array we may have implementations of these operators with 6 inputs or 6 outputs, which significantly reduce the SE-PAR and PAR-SE trees. Fig. 7 shows a snapshot of the execution of FDCT showing the operations activity.Fig. 7. Activity of the FUs during initial execution of the FDCT implementation shown in Fig. 6b. After 102 input samples the implementation starts outputting each result with maximum throughputThe mapping algorithms presented in Section 3 have been implemented in Java. Table 2 shows the mapping results for the three benchmarks on three different array topologies (Grid, Grid 1-hop, and Hexagonal), each one with two different connection structures (0,0,2 and 2,2,0 indicate 2 bidirectional connections, and 2 input and 2 output connections in each Edge of a PE, respectively). Each example has been mapped in 200 to 400 ms of CPU time.Column “N/E” specifies the number of nodes and edges for each dataflow graph. Average path lengths (measured as the number of PEs that a connection needs to traverse from the source to the sink PE) after mapping the examples onto three topologies are shown in columns “P”. Columns “M” represent the maximum path length for each example when mapped in the correspondent array. Cells in Table 2 identified by “-“ represent cases unable to be placed and routed by the current version of our mapping algorithm. Those cases happened with the FDCT examples on the Grid topology.The results obtained, using the implemented mapping scheme, show that the simpler Grid topology is the worst in terms of maximum and average path lengths. Hexagonal and the Grid 1-hop perform distinctly according to the benchmark. The hexagonal seems to perform better for the FIR filters and is outperformed by the Grid 1-hop for the other benchmarks (CPLX and FDCT). Values in bold in Table 2 highlight the best results. The results confirm that the Grid 1-hop outperforms the Grid topology as been already shown in [15]. Note however that the hexagonal topology was not evaluated in [15].Table 2. Results of mapping the benchmarks on different array topologies and interconnection patterns between PEsEX N/E P&R Grid Grid 1-hop Hexagonal2,2,00,0,20,0,20,0,22,2,02,2,0P M P M P M P M P M P MPR1FIR2 7/71.28 2 1.282 1.282 1.282 1.282 1.28 21.28PR22 1.282 1.282 1.282 1.142 1.14 24 1.603 1.402 1.402 1.332 1.33 21.86FIR4 13/15PR11.603 1.603 1.462 1.462 1.262 1.26 2PR26 2.035 1.583 1.583 1.544 1.54 41.96PR1FIR8 25/315 1.835 1.543 1.543 1.514 1.51 4PR21.83FIR16 49/63 PR1 2.25 9 2.2611 1.715 1.695 1.557 1.55 79 2.1511 1.715 1.735 1.718 1.71 82.19PR2PR1 1.71 6 1.756 1.463 1.463 1.395 1.39 5 CPLX4 22/281.716 1.716 1.463 1.463 1.504 1.50 4PR2PR1 2.46 10 2.3111 1.735 1.735 1.757 1.76 7 CPLX8 46/6010 2.2111 1.616 1.616 1.808 1.80 8PR22.13FDCTa 92/124 PR1 - - 2.4914 1.836 2.097 2.088 2.32 92.4110 1.835 1.9610 2.0710 2.10 9-PR2-FDCTb 102/134 PR1 - - 2.3214 1.756 1.947 2.0110 2.24 9-2.4212 1.765 1.858 1.9710 2.02 10-PR2FDCT 136/186 PR1 - - - - 3.1915 3.3121 4.6122 4.31 28PR2 - - - - 2.9113 3.0116 3.7120 4.04 215. ConclusionsThis paper presents further enhancements in an environment to simulate and explore data-driven array architectures. Although in those architectures many featuresare worth to be explored, developing an environment capable to exploit all the important features is a tremendous task. In our case we have firstly selected a subsetof properties to be modeled: FIFO sizes, grid or hexagonal topologies, etc. Notice, however, that the environment has been developed bearing in mind incremental enhancements, each one contributing to a more powerful exploration.A first version of a mapping approach developed to easily explore different array configurations is presented and results achieved for hexagonal and grid topologies are shown. This first version truly proves the flexibility of the scheme.Ongoing work intends to add more advanced mapping schemes to enable a comparison between different array topologies independent from the mapping algorithm used to conduct the experiments. Forms to deal with heterogeneous array elements distributed through an array are also under focus.Further work is also needed to allow the definition of the configuration format foreach PE of the architecture being evaluated, as well as, automatic VHDL generationto prototype a certain array or data-driven solution in an FPGA. We have also long-term plans to include a front-end compiler to continue studies of some data-drivenarray features with complex benchmarks.We really hope that further developments will contribute to an environment able to evaluate new data-driven array architectures prior to fabrication.References1.R. Hartenstein, “A Decade of Reconfigurable Computing: a Visionary Retrospective,” InInt’l Conf. on Design, Automation and Test in Europe (DATE’01), Munich, Germany, March 12-15, 2001, pp. 642-649.2.L. Bossuet, G. Gogniat, and J. L. Philippe, “Fast design space exploration method forreconfigurable architectures,” In Int’l Conference on Engineering of Reconfigurable Systems and Algorithm (ERSA’03), Las Vegas, Nevada, June 23-26, 2003.3. A. H. Veen, “Dataflow machine architecture,” in ACM Computing Surveys, Vol. 18, Issue4, 1986, pp. 365-396.4.R. Hartenstein, Rainer Kress, Helmut Reinig, “A Dynamically Reconfigurable WavefrontArray Architecture,” in Proc. Int’l Conference on Application Specific Array Processors (ASAP’94), Aug. 22-24, 1994, pp. 404-414.5.W. Thies, M. Karczmarek, and S. Amarasinghe, “StreamIt: A Language for StreamingApplications,” In Proc. of the Int’l Conf. on Compiler Construction (CC’02), 2002.6.N. Imlig, et al., “Programmable Dataflow Computing on PCA,” in IEICE Trans.Fundamentals, vol. E83-A, no. 12, December 2000, pp. 2409-2416.7.R. Hartenstein, M. Herz, T. Hoffmann, and U. Nageldinger, “Generation of DesignSuggestions for Coarse-Grain Reconfigurable Architectures,” in 10th Int’l Workshop on Field Programmable Logic and Applications (FPL’00), Villach, Austria, Aug. 27-30, 2000.8.R. Hartenstein, M. Herz, Th. Hoffmann, and U. Nageldinger, “KressArray Xplorer: ANew CAD Environment to Optimize Reconfigurable Datapath Array Architectures,” in 5th Asia and South Pacific Design Automation Conference (ASP-DAC’00), Yokohama, Japan, pp. 163-168.9.Ricardo Ferreira, João M. P. Cardoso, and Horácio C. Neto, “An Environment forExploring Data-Driven Architectures,” in 14th Int’l Conference on Field Programmable Logic and Applications (FPL’04), LNCS 3203, Springer-Verlag, 2004, pp. 1022-1026. 10.N. Hendrich, “A Java-based Framework for Simulation and Teaching,” in 3rd EuropeanWorkshop on Microelectronics Education (EWME’00), Aix en Provence, France, 18-19, May 2000, Kluwer Academic Publishers, pp. 285-288.11. D. Burger, et al., “Scaling to the End of Silicon with EDGE architectures,” in IEEEComputer, July 2004, pp. 44-55.12.João M. P. Cardoso, “Self Loop Pipelining and Reconfigurable Dataflow Arrays,” in Int’lWorkshop on Systems, Architectures, MOdeling, and Simulation (SAMOS IV), Samos, Greece, July 19-21, 2004, LNCS 3133, Springer Verlag, pp. 234-243.13.João M. P. Cardoso, “Dynamic Loop Pipelining in Data-Driven Architectures,” in ACMInt’l Conference on Computing Frontiers (CF’05), Ischia, Italy, May 4-6, 2005.14.I. Koren, et al., “A Data-Driven VLSI Array for Arbitrary Algorithms,” in IEEEComputer, Vol. 21, No 10, 1989, pp. 30-43.15.N. Bansal, et al., “Network Topology Exploration of Mesh-Based Coarse-GrainReconfigurable Architectures,” in Design, Automation and Test in Europe Conference (DATE ‘04), Paris, France, Feb. 16-20, 2004, pp. 474-479.16.Texas Instruments, Inc. TMS320C6000™ Highest Performance DSP Platform. 1995-2003, /sc/docs/products/dsp/c6000/benchmarks/62x.htm#search17.M. Budiu, and S. C. Goldstein, “Compiling application-specific hardware,” InProceedings 12th Int’l Conference on Field Programmable Logic and Applications (FPL’02), LNCS 2438, Springer-Verlag, 2002, pp. 853–863.18.João M. P. Cardoso, and Markus Weinhardt, “XPP-VC: A C Compiler with TemporalPartitioning for the PACT-XPP Architecture,” in 12th Int’l Conference on Field Programmable Logic and Applications (FPL'02), LNCS 2438, Springer Verlag, 2002, pp.864-874.。